Conformer generation and molecular minimisation

Note

The ccdc.conformer module is available only to CSD-Discovery, CSD-Materials and CSD-Enterprise users.

Introduction

The ccdc.conformer module can be used to generate a diverse ensemble of conformers that are optimised to reflect the geometrical preferences of molecules observed in the CSD.

The methodology requires an input molecule where all atoms have 3D coordinates. The following steps are then performed:

  • The bond distances and angles are optimised according to data from the CSD
  • Conformational space is sampled using rotamer distributions and ring templates derived from the CSD and the generated conformers are scored
  • A diverse subset is selected

The ccdc.conformer.ConformerGenerator is available in the ccdc.conformer module. Let us import this module.

>>> from ccdc import conformer

Let us also import the ccdc.io module so that we can read in a molecule of interest and define the path to a molecule file of interest, in this instance a CDK2 inhibitor.

>>> from ccdc import io
>>> filepath = '2vtp-lig.mol2'

In this instance we will get the first molecule in the file.

>>> mol_reader = io.MoleculeReader(filepath)
>>> mol = mol_reader[0]

Note

For more information on the conformer generation algorithm please see: “Knowledge-Based Conformer Generation Using the Cambridge Structural Database”, J. C. Cole, O. Korb, P. McCabe. M. G. Read and R. Taylor, J. Chem. Inf. Model., 58, 615-629, 2018, DOI: 10.1021/acs.jcim.7b00697.

Conformer Generation

Setting Up the Calculation

To generate conformers we create a ccdc.conformer.ConformerGenerator.

>>> conformer_generator = conformer.ConformerGenerator()

Note

There are optional parameters for finer control of the ccdc.conformer.ConformerGenerator; please see the API documentation for further detail.

The ccdc.conformer.ConformerGenerator has a number of settings that can be customised, for example the maximum number of conformers to generate, that is a representative set of high probability and at the same time diverse conformers. The ccdc.conformer.ConformerSettings.max_conformers setting specifies the maximum number of conformers to generate. If one specifies a large number and the conformational space is exhausted before this number is reached a smaller number of conformers will be generated. By default the conformer generator will attempt to generate 200 conformers, but this number can be changed by the user. Let us change it to 25.

>>> conformer_generator.settings.max_conformers
200
>>> conformer_generator.settings.max_conformers = 25

Another setting that can be customised is the ccdc.conformer.ConformerSettings.max_unusual_torsions, which represents the number of unusual rotamers allowed per conformer. By default this number is set to 2.

Finally, it is possible to superimpose the resulting set of conformers onto a common refernce frame, that is all atoms of the input molecule. To enable this feature:

>>> conformer_generator.settings.superimpose_conformers_onto_reference = True

To generate a set of conformers for a molecule one simply calls the ccdc.conformer.ConformerGenerator.generate() function. It is important that all atoms in the molecules, including hydrogen atoms, have coordinates. Molecules with missing coordinates are ignored.

>>> conformers = conformer_generator.generate(mol)
>>> len(conformers)
25

Note

The ccdc.conformer.ConformerGenerator.generate() function can also take a list of molecules which may give some performance improvement when working on large numbers of molecules.

../_images/conformers_onto_reference.png

Figure illustrating the set of 25 conformers superimposed onto the input molecule.

It is possible to overlay the list of conformers onto a specific set of atoms using the ccdc.descriptors.MolecularDescriptors.overlay(). For example, we can superimpose the 25 conformers onto the pyrazole substructure.

>>> from ccdc.search import SubstructureSearch, SMARTSSubstructure
>>> from ccdc import descriptors as d
>>> query = 'C1=CNN=C1'
>>> conformers_mols = [c.molecule for c in conformers]
>>> ss_search = SubstructureSearch()
>>> substructure = SMARTSSubstructure(query)
>>> sub_id = ss_search.add_substructure(substructure)
>>> hits = ss_search.search(conformers_mols, max_hits_per_structure=1)
>>> ref_ats = hits[0].match_atoms()

Let’s write the superimposed conformers to a .MOL2 file.

>>> with io.MoleculeWriter('superimposed_conformers.mol2') as writer:
...     for hit in hits:
...         hit_ats = hit.match_atoms()
...         atoms = zip(ref_ats, hit_ats)
...         superimposed_hit = d.MolecularDescriptors.overlay(hits[0].molecule, hit.molecule, atoms)
...         writer.write(superimposed_hit)
../_images/conformers_onto_pyrazole.png

Figure illustrating the set of 25 conformers superimposed onto the pyrazole scaffold.

Analysing the Conformer Hit List

The ccdc.conformer.ConformerGenerator.generate() function returns a ccdc.conformer.ConformerHitList object, which behaves like a Python list. The ccdc.conformer.ConformerHitList contains attributes relating to the overall performance of the run. For example, whether or not the sampling limit was reached and how many rotamers had no observations in the CSD.

>>> conformers.sampling_limit_reached
False
>>> conformers.n_rotamers_with_no_observations
0

An individual ccdc.conformer.ConformerHit contains a ccdc.molecule.Molecule as well as the ccdc.conformer.ConformerHit.normalised_score - a value between 0.0 (most probable) and 1.0 (least probable). Conformers are listed in probability order (the most probable first).

>>> most_probable_conformer = conformers[0]
>>> most_probable_conformer.molecule    
<ccdc.molecule.Molecule ...>
>>> '%.2f' % most_probable_conformer.normalised_score
'0.00'

It is possible to calculate the rmsd of each conformer in the hit list with respect to the input molecule (this is the default behaviour) or with respect to the minimised molecule.

>>> print(round(most_probable_conformer.rmsd(), 3)) 
0.723
>>> print(round(most_probable_conformer.rmsd(wrt='minimised'), 3)) 
0.690

Locking Rotatable Bonds

Particular bonds can be locked so that they are not rotated during conformer generation. This can make the conformer generator create more diversity in other areas of the molecule.

>>> conformer_generator.lock_torsion(mol.bond('C13', 'N14'))

Molecular Minimisation

The ccdc.conformer module also contains functionality for optimising a molecule to the geometrical preferences of bond distances and valence angles observed in the CSD.

Note

By default this is the first step in conformer generation.

The geometrical optimisation makes use of the Tripos force field functional forms and, where available, equilibrium bond distances and valence angles are parametrised using data obtained from the CSD.

>>> molecule_minimiser = conformer.MoleculeMinimiser()
>>> minimised_mol = molecule_minimiser.minimise(mol)

To compare the two molecules we can make use of the ccdc.descriptors.MolecularDescriptors.rmsd() function.

>>> from ccdc.descriptors import MolecularDescriptors
>>> round( MolecularDescriptors.rmsd(mol, minimised_mol), 3)  
0.434