OpenEye Scientific Software

 

Welcome to SAMPL-1

SAMPL is an attempt at prospectively testing protein and ligand modeling. Ideally this would consist of experiments conceived to distinguish between competing ideas or methods and perhaps over time we shall get there. For now, this is an analysis of methods on data not seen by participants, a 'blind" assessment. We make no claims against the limitations of such an attempt, believing it more important to light a candle than curse the darkness. And blind tests do help us avoid the tendency to bias our theories and approaches to known answers. They also provide a more realistic "real world" setting for methods.

With SAMPL we intend to avoid "who won, who lost". On such a small sample size, no such statistically sound pronouncement could be made anyway. Rather, we see this as an opportunity for groups to test their methods, learn from the experience and share lessons learnt. At CUP8 we had a small taste of this from data sets on vacuum-water transfer and protein-ligand binding, enough to realize this was valuable. The work on solvation prediction will appear in J. Med. Chem. later this year and the aim will be to publish the results of SAMPL. We will follow an 'opt-in' policy for any publication, i.e. the results from any participants can be used but attribution is by permission. We hope this may encourage some who, perhaps correctly, feel they have more to lose than gain. Details can be found in the sign-up agreement.

SAMPL-1 will consist of two sets protein-ligand binding data, generously provided by Abbott Labs and Vertex Pharmaceuticals, and sixty three vacuum-water transfer energies, courtesy of Peter Guthrie at the University of Ontario, and CCG. The two sets of protein-ligand data each have between twenty and sixty active compounds, the majority of with protein-ligand crystal structures. The assessment will consist of three parts:

  1. Divining actives from a set of strategically chosen inactives.
  2. Predicting the binding pose of each active.
  3. Predicting the binding affinity, or rank order, of each ligand.

As such, there will be a progressive disclosure of information for each target. First a set of active and inactives will be available as SD Files, then a listing of the actual actives, then a listing of poses. Each level of information corresponds to an expected test, i.e. virtual screening followed by pose prediction followed by affinity estimation. A participant may skip a phase, for instance download the poses to go straight to affinity prediction, but this will void the 'upstream' assessments of pose and screening. The vacuum-water transfer energies are provided by Peter Guthrie, with thanks to CCG who paid a summer student to help uncover literature values far from the beaten track. Participants will receive a set of fifty SMILES strings while three dimensional coordinates, conformations, tautomers, charges states and charge distributions must be derived.

More details will be forthcoming, including expected formats for entries, cut-off dates for applications and expected degrees of confidentiality. The final disclosure of information, affinities and solvation energies, will occur one month before the CUP9 meeting (currently scheduled for March 16th-19th, 2008). All participants will be offered a speaking slot at the meeting, although time constraints may impact duration.

Hopefully SAMPL will prove useful and becomes a regular event. There are many possible variants we could try in future assessments, such as providing partial information, e.g. a subset of actives or binding modes, more typically of an industrial project setting. Additional physical properties, such as tautomer ratios, pKas, thermodynamic data could be sought. I strongly believe a prospective component to our field will help us judge progress and hope you both agree and can contribute.

Kind Regards,
Anthony Nicholls
President, Founder,
OpenEye Scientific Software, Inc.