Conformer API

Introduction

The ccdc.conformer module contains classes concerned with molecular conformations.

The three main classes of the ccdc.conformer module are:

A ccdc.conformer.MoleculeMinimiser instance can be used to optimise the bond distances and valence angles of a 3D input molecule using the ccdc.conformer.MoleculeMinimiser.minimise() function:

from ccdc.conformer import MoleculeMinimiser
molecule_minimiser = MoleculeMinimiser()
minimised_mol = molecule_minimiser.minimise(mol)

A ccdc.conformer.ConformerGenerator instance can be used to generate a set of conformers for an input molecule using the ccdc.conformer.ConformerGenerator.generate() function:

from ccdc.conformer import ConformerGenerator
from ccdc.io import MoleculeWriter
conformer_generator = ConformerGenerator()
conformers = conformer_generator.generate(mol)
with MoleculeWriter('conformers.mol2') as mol_writer:
    for c in conformers:
        mol_writer.write(c.molecule)

A ccdc.conformer.GeometryAnalyser instance can be used to analyse the geometry of an input molecule using a knowledge-based library of intramolecular geometries based on the CSD.

The ccdc.conformer.GeometryAnalyser class contains nested classes:

The ccdc.conformer.GeometryAnalyser.analyse_molecule() function can be used to validate the complete geometry of a given query structure.

>>> from ccdc.io import EntryReader
>>> csd_reader = EntryReader('CSD')
>>> yigpio01 = csd_reader.molecule('YIGPIO01')
>>> from ccdc.conformer import GeometryAnalyser
>>> analysis_engine = GeometryAnalyser()
>>> checked_mol = analysis_engine.analyse_molecule(yigpio01)
>>> for tor in checked_mol.analysed_torsions:
...     if tor.unusual:
...         print('%s: %d %.2f' % (', '.join(tor.atom_labels), tor.nhits, tor.local_density)) 
...
C36, C12, C11, N1: 72 2.78
O4, C31, N5, C24: 3743 3.55
O5, C31, N5, C24: 3736 3.93
O5, C32, C33, S1: 108 1.85
O5, C32, C33, C34: 73 4.11

API

Knowledge base version number

ccdc.conformer._mogul_version()[source]

The version of mogul being used.

Molecule minimisation

class ccdc.conformer.MoleculeMinimiser(nthreads=1, parameter_locator=<ccdc.conformer.DefaultConformerParameterFileLocator object>)[source]

Minimises a single or a list of molecules.

minimise(mol)[source]

Return a minimised copy of the input molecule.

This makes use of the Tripos force field functional forms.

However, where available equilibrium bond distances and valence angles are parameterised using data obtained from CSD distributions.

Parameters

molccdc.molecule.Molecule

Returns

ccdc.molecule.Molecule

Conformer generation

Note

The ConformerGenerator class is available only to CSD-Discovery, CSD-Materials and CSD-Enterprise users.

class ccdc.conformer.ConformerGenerator(settings=None, skip_minimisation=False, nthreads=1, parameter_locator=<ccdc.conformer.DefaultConformerParameterFileLocator object>)[source]

Generates conformers for a single or a list of molecules.

This functionality is available only under licenced conditions. Please contact support@ccdc.cam.ac.uk for details.

generate(mols)[source]

Generate conformers for supplied molecule(s).

Parameters

mols – a ccdc.molecule.Molecule or a list of ccdc.molecule.Molecule

Returns

a ccdc.conformer.ConformerHitList or a list of ccdc.conformer.ConformerHitList instances

Note that missing hydrogen atoms will be added to conformers in order to generate 3D coordinates, unless ccdc.conformer.ConformerSettings.reject_missing_hydrogen is set in which case None is returned.

static lock_torsion(bond)[source]

Specify that a particular torsion should not be changed when generating conformers of its molecule.

If the bond is in a ring, the whole ring will be locked.

Parameters

bond – a ccdc.molecule.Bond instance.

class ccdc.conformer.ConformerHitList(identifier, _dr)[source]

A conformer generator result.

property distributions_pruned

Whether or not the geometry distributions were pruned in order to perform an exhaustive search.

property flexible_rings

The flexible rings considered by the generator.

property max_log_probability

Maximum log probability.

property min_log_probability

Minimum log probability.

property minimised_molecule

The minimised molecule from which conformers were generated.

property n_flexible_rings_in_molecule

The number of flexible rings in the molecule.

property n_flexible_rings_sampled

The number of flexible rings sampled by the generator.

This may be smaller than the number of rings in the input molecule if there are no data in the CSD for the ring.

property n_flexible_rings_with_no_observations

Number of flexible rings for which no crystallographic data is available.

property n_matched_rotamers

Rotamers which have been matched in the fragment_library.txt parameter file.

property n_rotamers_in_molecule

The number of rotamers in the molecule.

property n_rotamers_sampled

The number of rotamers sampled by the generator.

This may be smaller than the number of rotamers in the input molecule if there are no data in the CSD for the rotamer.

property n_rotamers_with_no_observations

Number of rotamers for which no crystallographic data is available.

property original_molecule

The input molecule.

property rotamers

The rotamers considered by the generator.

property rotamers_with_no_observations

The list of bonds for which the CSD was unable to provide enough input data.

property sampling_limit_reached

Whether the internal sampling limit as been reached.

class ccdc.conformer.ConformerHit(mol, parent)[source]

An individual conformer.

property normalised_score

Normalised score associated with this conformer (0 = best, 1 = worst).

property probability

Probability associated with this conformer.

rmsd(wrt='original', reference=None, exclude_hydrogens=True)[source]

Return the RMSD of this conformer with respect to a reference, the original or the minimised molecule.

Parameters
  • referenceNone or a CSD molecule object. If not None, the rmsd is measured with respect to this reference

  • wrt – either ‘original’ or ‘minimised’. This is ignored if a reference molecule is passed in

  • exclude_hydrogens – boolean

Returns

float

class ccdc.conformer.ConformerSettings[source]

Settings for conformer generation.

Any settings that are set to None will be set to the system defaults.

max_conformers = None

Maximum number of conformers to generate.

max_unusual_torsions = None

Number of unusual torsions allowed per confomer.

reject_missing_hydrogen = False

Whether or not to reject input molecules with missing hydrogen atoms.

superimpose_conformers_onto_reference = None

Whether or not to superimpose to a common reference.

Geometry analysis

class ccdc.conformer.GeometryAnalyser(settings=None, databases=None, ignore_updates=False)[source]

The geometry analysis engine.

class Analysis(analysis, mol, classification, settings, siteless)[source]

A single geometric analysis for a specific bond, angle, torsion or ring feature.

property atom_labels

The labels of atoms in the reference fragment.

property d_min

Return the distance to the nearest observed value.

If rawscore is not specified, the geometric value of the query fragment will be used.

property distribution

List of numeric values found by the search.

property enough_hits

Whether there be enough hits for a sound judgement.

property few_hits

Whether there be enough hits for a sound judgement.

property fragment_label

Underscore separated string of atom labels.

property generalised

Whether or not the analysis for this fragment resulted from a generalised search.

histogram(bin_size=None, minimum=None, maximum=None)[source]

Return the histogram of the distribution as a tuple of integers.

This function puts the distribution values into bins according to the criteria specified.

Parameters
  • bin_size – defaults to (maximum - minimum)/40 if set to None

  • minimum – The minimum value of the distribution range. If None, defaults to 0 for torsions, or the minimum value in the distribution (or the query fragment value if smaller) for other fragment types

  • maximum – The maximum value of the histogram range. If None, defaults to 180 for torsions, or the maximum value in the distribution (or the query fragment value if larger) for other fragment types

Returns

tuple of integers

property hit_identifiers

List of molecule identifiers of the hits in the distribution.

property hit_molecules

The list of molecules hit by this result.

property hits

List of ccdc.conformer.GeometryAnalyser.AnalysisHit instances found by the search.

Note that the features below can be extracted from an ccdc.conformer.GeometryAnalyser.AnalysisHit:

For more information see the ccdc.conformer.GeometryAnalyser.AnalysisHit documentation.

property local_density

Local density of the distribution around the query value.

property lower_quartile

The lower quartile of the distribution.

property maximum

The maximum of the distribution.

property mean

The mean of the distribution.

property median

The median of the distribution.

property minimum

The minimum of the distribution.

property nhits

The number of hits in the distribution.

property no_hits

Whether the fragment has no data within the CSD.

percentile(p)[source]

Return the percentile of the observed value.

Raises

TypeError if the value (p) is not between 0 and 1.

property standard_deviation

The standard deviation of the distribution.

property type

The type of geometric feature represented by this result.

In other words was this ccdc.conformer.GeometryAnalyser.Analysis derived from a bond, angle, torsion or ring analysis.

property unusual

Check if the geometric feature is unusual or not.

If the enough_hits and few_hits parameters are set to True (default behaviour) this function will return True if the geometric feature is classified as unusual.

If the few_hits parameter is set to False this function will only return True if the geometric feature is unusual and there are enough hits to support this claim.

If the enough_hits parameter is set to False this function will only return True if the geometric feature is unusual and there is not enough hits to support this claim.

If both the enough_hits and few_hits parameter are set to False then this function will always return False.

property upper_quartile

The upper quartile of the distribution.

property value

Geometric value represented by the reference fragment.

property z_score

Return the Z-score of the observed value.

class AnalysisHit(refcode, source, value, _analysis, _distrib, _index)[source]

A single geometry analysis hit fragment.

In other words one of the observations that make up the geometry analysis distribution.

property atom_indices

The indices of the matched atoms in the hit molecule.

property atom_labels

The labels of the matched atoms in the hit molecule.

property atoms

The atoms of a hit.

property bond_length

The bond length of the hit fragment.

Raises

TypeError if the hit is not for a bond length

property crystal

The hit crystal.

property entry

The hit entry.

property identifier

The identifier of the hit.

property molecule

The hit molecule.

property similarity_score

The similarity of the matched fragment to the analysed fragment.

This will be 1.0 for an exact match, and a lower value for a generalised match.

property source_name

The name of the source of the hit.

property torsion_angle

The absolute value of the torsion angle of the hit fragment. The sign of a torsion angle calculated from a CSD entry is often arbitrary. For example, if the CSD entry is centrosymmetric, for every torsion angle with a positive sign there is, elsewhere in the unit cell, a symmetry-equivalent torsion with a negative sign. Consequently, only the absolute values of torsion angles are used.

Raises

TypeError if the hit is not for a torsion angle

property valence_angle

The valence angle of the hit fragment.

Raises

TypeError if the hit is not for a valence angle

class Settings[source]

Controls the operation of the geometry analyser.

class GeometrySettings(identifier, type, settings)[source]

Settings for a particular fragment type.

In other words settings that are applied to one of the below:

  • Bond distances

  • Valence angles

  • Torsion angles

  • Ring RMSDs

property analyse

Whether to analyse this fragment type.

property classification_measure

How to measure whether an observation is unusual.

property classification_measure_threshold

The value at which an observation will be found to be unusual.

property few_hits_threshold

Threshold below which a distribution is considered to have too few hits.

property local_density_threshold

Local density threshold used to classify torsions and rings as unusual.

Note that the local density is irrelevant for bonds and angles.

property local_density_tolerance

The local density tolerance.

property min_obs_exact

Minimum acceptable size of an exact distribution.

If there is no distribution containing at least this number of observations the geometry analyser will perform a generalised search according to the criteria specified by other settings.

property min_obs_generalised

Minimum number of observations that the geometry analyser should try to find.

If this is 0 then generalised searches will never be performed.

Similarly, if generalisation has been turned off this setting will not have an effect.

property min_relevance

Relevance criterion for a generalised hit to be accepted.

The geometry analyser determines how similar a fragment is to the query by calculating a relevance value. The min_relevance setting tells the geometry analyser to accept, in a generalised search, only fragments whose relevance is equal to or greater than this threshold.

summary()[source]

Return a summary the settings as a string.

property zscore_threshold

Z-score threshold used to classify bonds and angles as unusual.

Note that the z-score is irrelevant for torsions and rings.

property generalisation

Setting determining if searches should be generalised or not.

property heaviest_element

Filter on heaviest element.

This setting tells the geometry analyser to ignore hits from CSD structures that have elements heavier than that for a specified atomic symbol.

The atomic symbol is case sensitive.

property impose_upper_limits

Whether there an upper limit imposed on generalised searches or not.

This setting tells the geometry analyser whether or not to limit the number of levels traversed for generalised searches. Occasionally the geometry analyser can take a very long time to identify similar fragments when performing a generalised search. Limiting the number of levels traversed will reduce the chances of this happening but may also result n fewer hits being found.

property organometallic_filter

Configure how organometallic and organic hits should be filtered.

This setting instructs the geometry analyser to ignore fragments depending on whether they are from organic or organometallic structures.

There are three possible options for this setting:

  • ‘all’

  • ‘metalorganics_only’

  • ‘organics_only’

property powder_filter

Configure whether or not powder structures be filtered.

This setting instructs the geometry analyser to ignore if set to True or retain if set to False, fragments from powder study analyses.

property rfactor_filter

Filter on R-factor.

Note that there are only four possible settings for this option:

  • 0.05

  • 0.075

  • 0.1

  • any

However you can set the filter using any value and the appropriate filter will be selected. Note that if the value supplied is greater than 0.1 this means that the R-factor filter will be set to None. If you set the filter to None or ‘any’ the filter will also be set to None.

property solvent_filter

Configure how solvents and non-solvents should be filtered.

This setting instructs the geometry analyser to ignore fragments depending on whether they are from solvent or non-solvent molecules.

There are three possible options for this setting:

  • ‘include_solvent’

  • ‘exclude_solvent’

  • ‘only_solvent’

summary()[source]

Return a summary the settings as a string.

analyse_angle(a, b, c)[source]

Perform a geometry analysis on a single valence angle.

Params a

ccdc.molecule.Atom

Params b

ccdc.molecule.Atom

Params c

ccdc.molecule.Atom

Returns

ccdc.conformer.GeometryAnalyser.Analysis

Raises

TypeError if the atoms supplied do not make up a bonded angle

analyse_bond(a, b)[source]

Perform a geometry analysis on a single bond.

Params a

ccdc.molecule.Atom

Params b

ccdc.molecule.Atom

Returns

ccdc.conformer.GeometryAnalyser.Analysis

Raises

TypeError if the atoms supplied do not form a covalent bond

analyse_molecule(mol, _max_atoms_to_analyse=999)[source]

Perform a geometry analysis of the whole molecule.

Params mol

ccdc.molecule.Molecule to be analysed

Returns

ccdc.molecule.Molecule augmented with analysis data

analyse_ring(*ats)[source]

Perform a geometry analysis on a single ring.

Params *ats*ats

ccdc.molecule.Atom instances that make up the ring

Returns

ccdc.conformer.GeometryAnalyser.Analysis

Raises

TypeError if the atoms supplied do not make up a ring

analyse_torsion(a, b, c, d)[source]

Perform a geometry analysis on a single torsion angle.

Params a

ccdc.molecule.Atom

Params b

ccdc.molecule.Atom

Params c

ccdc.molecule.Atom

Params d

ccdc.molecule.Atom

Returns

ccdc.conformer.GeometryAnalyser.Analysis

Raises

TypeError if the atoms supplied do not make up a bonded torsion

property database_files_name

The name of the databases, for example [‘CSD’, ‘Sep23_ASER’]

Returns

a list of str

property database_files_path

The directory of the databases

Returns

a list of str

property database_files_source_db_file_name

The file name of the source databases

Returns

a list of str

fragment_identifier(fragment)[source]

The unique identifier of a particular type of fragment.

This is a string encoding the molecular environment of a fragment.

Parameters

fragment – an instance of ccdc.conformer.GeometryAnalyser.Analysis

Returns

a string of four numbers separated by colons.