Conformer API¶
Introduction¶
The ccdc.conformer
module contains classes concerned with molecular
conformations.
The three main classes of the ccdc.conformer
module are:
A ccdc.conformer.MoleculeMinimiser
instance can be used to optimise the bond
distances and valence angles of a 3D input molecule using the
ccdc.conformer.MoleculeMinimiser.minimise()
function:
from ccdc.conformer import MoleculeMinimiser
molecule_minimiser = MoleculeMinimiser()
minimised_mol = molecule_minimiser.minimise(mol)
A ccdc.conformer.ConformerGenerator
instance can be used to generate a set of
conformers for an input molecule using the
ccdc.conformer.ConformerGenerator.generate()
function:
from ccdc.conformer import ConformerGenerator
from ccdc.io import MoleculeWriter
conformer_generator = ConformerGenerator()
conformers = conformer_generator.generate(mol)
with MoleculeWriter('conformers.mol2') as mol_writer:
for c in conformers:
mol_writer.write(c.molecule)
A ccdc.conformer.GeometryAnalyser
instance can be used to analyse the geometry
of an input molecule using a knowledge-based library of intramolecular
geometries based on the CSD.
The ccdc.conformer.GeometryAnalyser
class contains nested classes:
ccdc.conformer.GeometryAnalyser.Settings
ccdc.conformer.GeometryAnalyser.Analysis
ccdc.conformer.GeometryAnalyser.AnalysisHit
The ccdc.conformer.GeometryAnalyser.analyse_molecule()
function can be used to validate
the complete geometry of a given query structure.
>>> from ccdc.io import EntryReader
>>> csd_reader = EntryReader('CSD')
>>> yigpio01 = csd_reader.molecule('YIGPIO01')
>>> from ccdc.conformer import GeometryAnalyser
>>> analysis_engine = GeometryAnalyser()
>>> checked_mol = analysis_engine.analyse_molecule(yigpio01)
>>> for tor in checked_mol.analysed_torsions:
... if tor.unusual:
... print('%s: %d %.2f' % (', '.join(tor.atom_labels), tor.nhits, tor.local_density))
...
C36, C12, C11, N1: 64 3.12
O4, C31, N5, C24: 3394 3.54
O5, C31, N5, C24: 3389 3.92
O5, C32, C33, S1: 105 1.90
O5, C32, C33, C34: 73 4.11
API¶
Knowledge base version number¶
Molecule minimisation¶
-
class
ccdc.conformer.
MoleculeMinimiser
(nthreads=1, parameter_locator=<ccdc.conformer.DefaultConformerParameterFileLocator object>)[source]¶ Minimises a single or a list of molecules.
-
minimise
(mol)[source]¶ Return a minimised copy of the input molecule.
This makes use of the Tripos force field functional forms.
However, where available equilibrium bond distances and valence angles are parameterised using data obtained from CSD distributions.
Parameters: mol – ccdc.molecule.Molecule
Returns: ccdc.molecule.Molecule
-
Conformer generation¶
Note
The ConformerGenerator
class is available only to CSD-Discovery, CSD-Materials
and CSD-Enterprise users.
-
class
ccdc.conformer.
ConformerGenerator
(settings=None, skip_minimisation=False, nthreads=1, parameter_locator=<ccdc.conformer.DefaultConformerParameterFileLocator object>)[source]¶ Generates conformers for a single or a list of molecules.
This functionality is available only under licenced conditions. Please contact support@ccdc.cam.ac.uk for details.
-
generate
(mols)[source]¶ Generate conformers for supplied molecule(s).
Parameters: mols – a ccdc.molecule.Molecule
or a list ofccdc.molecule.Molecule
Returns: a ccdc.conformer.ConformerHitList
or a list ofccdc.conformer.ConformerHitList
instancesNote that conformers cannot be generated for molecules with missing coordinates, including for hydrogen atoms. Such molecules are ignored if they occur in the input.
-
static
lock_torsion
(bond)[source]¶ Specify that a particular torsion should not be changed when generating conformers of its molecule.
Parameters: bond – a ccdc.molecule.Bond
instance.
-
-
class
ccdc.conformer.
ConformerHitList
(identifier, _dr)[source]¶ A conformer generator result.
-
distributions_pruned
¶ Whether or not the geometry distributions were pruned in order to perform an exhaustive search.
-
flexible_rings
¶ The flexible rings considered by the generator.
-
max_log_probability
¶ Maximum log probability.
-
min_log_probability
¶ Minimum log probability.
-
minimised_molecule
¶ The minimised molecule from which conformers were generated.
-
n_flexible_rings_in_molecule
¶ The number of flexible rings in the molecule.
-
n_flexible_rings_sampled
¶ The number of flexible rings sampled by the generator.
This may be smaller than the number of rings in the input molecule if there are no data in the CSD for the ring.
-
n_flexible_rings_with_no_observations
¶ Number of flexible rings for which no crystallographic data is available.
-
n_matched_rotamers
¶ Rotamers which have been matched in the fragment_library.txt parameter file.
-
n_rotamers_in_molecule
¶ The number of rotamers in the molecule.
-
n_rotamers_sampled
¶ The number of rotamers sampled by the generator.
This may be smaller than the number of rotamers in the input molecule if there are no data in the CSD for the rotamer.
-
n_rotamers_with_no_observations
¶ Number of rotamers for which no crystallographic data is available.
-
original_molecule
¶ The input molecule.
-
rotamers
¶ The rotamers considered by the generator.
-
rotamers_with_no_observations
¶ The list of bonds for which the CSD was unable to provide enough input data.
-
sampling_limit_reached
¶ Whether the internal sampling limit as been reached.
-
-
class
ccdc.conformer.
ConformerHit
(mol, parent)[source]¶ An individual conformer.
-
normalised_score
¶ Normalised score associated with this conformer (0 = best, 1 = worst).
-
probability
¶ Probability associated with this conformer.
-
rmsd
(wrt='original', reference=None, exclude_hydrogens=True)[source]¶ Return the RMSD of this conformer with respect to a reference, the original or the minimised molecule.
Parameters: - reference –
None
or a CSD molecule object. If notNone
, the rmsd is measured with respect to this reference - wrt – either ‘original’ or ‘minimised’. This is ignored if a reference molecule is passed in
- exclude_hydrogens – boolean
Returns: float
- reference –
-
-
class
ccdc.conformer.
ConformerSettings
[source]¶ Settings for conformer generation.
Any settings that are set to
None
will be set to the system defaults.-
max_conformers
= None¶ Maximum number of conformers to generate.
-
max_unusual_torsions
= None¶ Number of unusual torsions allowed per confomer.
-
superimpose_conformers_onto_reference
= None¶ Whether or not to superimpose to a common reference.
-
Geometry analysis¶
-
class
ccdc.conformer.
GeometryAnalyser
(settings=None, databases=None)[source]¶ The geometry analysis engine.
-
class
Analysis
(analysis, mol, classification, settings, siteless)[source]¶ A single geometric analysis for a specific bond, angle, torsion or ring feature.
-
atom_labels
¶ The labels of atoms in the reference fragment.
-
d_min
¶ Return the distance to the nearest observed value.
If rawscore is not specified, the geometric value of the query fragment will be used.
-
distribution
¶ List of numeric values found by the search.
-
enough_hits
¶ Whether there be enough hits for a sound judgement.
-
few_hits
¶ Whether there be enough hits for a sound judgement.
-
fragment_label
¶ Underscore separated string of atom labels.
-
generalised
¶ Whether or not the analysis for this fragment resulted from a generalised search.
-
histogram
(bin_size=None, minimum=None, maximum=None)[source]¶ Return the histogram of the distribution as a tuple of integers.
This function puts the distribution values into bins according to the criteria specified.
Parameters: - bin_size – defaults to (maximum - minimum)/40 if set to None
- minimum – The minimum value of the distribution range. If None, defaults to 0 for torsions, or the minimum value in the distribution (or the query fragment value if smaller) for other fragment types
- maximum – The maximum value of the histogram range. If None, defaults to 180 for torsions, or the maximum value in the distribution (or the query fragment value if larger) for other fragment types
Returns: tuple of integers
-
hit_identifiers
¶ List of molecule identifiers of the hits in the distribution.
-
hit_molecules
¶ The list of molecules hit by this result.
-
hits
¶ List of
ccdc.conformer.GeometryAnalyser.AnalysisHit
instances found by the search.Note that the features below can be extracted from an
ccdc.conformer.GeometryAnalyser.AnalysisHit
:ccdc.conformer.GeometryAnalyser.AnalysisHit.molecule
ccdc.conformer.GeometryAnalyser.AnalysisHit.atom_indices
ccdc.conformer.GeometryAnalyser.AnalysisHit.atom_labels
value
of the geometric feature in the hit
For more information see the
ccdc.conformer.GeometryAnalyser.AnalysisHit
documentation.
-
local_density
¶ Local density of the distribution around the query value.
-
lower_quartile
¶ The lower quartile of the distribution.
-
maximum
¶ The maximum of the distribution.
-
mean
¶ The mean of the distribution.
-
median
¶ The median of the distribution.
-
minimum
¶ The minimum of the distribution.
-
nhits
¶ The number of hits in the distribution.
-
no_hits
¶ Whether the fragment has no data within the CSD.
-
percentile
(p)[source]¶ Return the percentile of the observed value.
Raises: TypeError if the value (p) is not between 0 and 1.
-
standard_deviation
¶ The standard deviation of the distribution.
-
type
¶ The type of geometric feature represented by this result.
In other words was this
ccdc.conformer.GeometryAnalyser.Analysis
derived from a bond, angle, torsion or ring analysis.
-
unusual
¶ Check if the geometric feature is unusual or not.
If the enough_hits and few_hits parameters are set to True (default behaviour) this function will return True if the geometric feature is classified as unusual.
If the few_hits parameter is set to False this function will only return True if the geometric feature is unusual and there are enough hits to support this claim.
If the enough_hits parameter is set to False this function will only return True if the geometric feature is unusual and there is not enough hits to support this claim.
If both the enough_hits and few_hits parameter are set to False then this function will always return False.
-
upper_quartile
¶ The upper quartile of the distribution.
-
value
¶ Geometric value represented by the reference fragment.
-
z_score
¶ Return the Z-score of the observed value.
-
-
class
AnalysisHit
(refcode, source, value, _analysis, _distrib, _index)[source]¶ A single geometry analysis hit fragment.
In other words one of the observations that make up the geometry analysis distribution.
-
atom_indices
¶ The indices of the matched atoms in the hit molecule.
-
atom_labels
¶ The labels of the matched atoms in the hit molecule.
-
atoms
¶ The atoms of a hit.
-
bond_length
¶ The bond length of the hit fragment.
Raises: TypeError if the hit is not for a bond length
-
crystal
¶ The hit crystal.
-
entry
¶ The hit entry.
-
identifier
¶ The identifier of the hit.
-
molecule
¶ The hit molecule.
-
similarity_score
¶ The similarity of the matched fragment to the analysed fragment.
This will be 1.0 for an exact match, and a lower value for a generalised match.
-
source_name
¶ The name of the source of the hit.
-
torsion_angle
¶ The absolute value of the torsion angle of the hit fragment. The sign of a torsion angle calculated from a CSD entry is often arbitrary. For example, if the CSD entry is centrosymmetric, for every torsion angle with a positive sign there is, elsewhere in the unit cell, a symmetry-equivalent torsion with a negative sign. Consequently, only the absolute values of torsion angles are used.
Raises: TypeError if the hit is not for a torsion angle
-
valence_angle
¶ The valence angle of the hit fragment.
Raises: TypeError if the hit is not for a valence angle
-
-
class
Settings
[source]¶ Controls the operation of the geometry analyser.
-
class
GeometrySettings
(identifier, type, settings)[source]¶ Settings for a particular fragment type.
In other words settings that are applied to one of the below:
- Bond distances
- Valence angles
- Torsion angles
- Ring RMSDs
-
analyse
¶ Whether to analyse this fragment type.
-
classification_measure
¶ How to measure whether an observation is unusual.
-
classification_measure_threshold
¶ The value at which an observation will be found to be unusual.
-
few_hits_threshold
¶ Threshold below which a distribution is considered to have too few hits.
-
local_density_threshold
¶ Local density threshold used to classify torsions and rings as unusual.
Note that the local density is irrelevant for bonds and angles.
-
local_density_tolerance
¶ The local density tolerance.
-
min_obs_exact
¶ Minimum acceptable size of an exact distribution.
If there is no distribution containing at least this number of observations the geometry analyser will perform a generalised search according to the criteria specified by other settings.
-
min_obs_generalised
¶ Minimum number of observations that the geometry analyser should try to find.
If this is 0 then generalised searches will never be performed.
Similarly, if generalisation has been turned off this setting will not have an effect.
-
min_relevance
¶ Relevance criterion for a generalised hit to be accepted.
The geometry analyser determines how similar a fragment is to the query by calculating a relevance value. The min_relevance setting tells the geometry analyser to accept, in a generalised search, only fragments whose relevance is equal to or greater than this threshold.
-
zscore_threshold
¶ Z-score threshold used to classify bonds and angles as unusual.
Note that the z-score is irrelevant for torsions and rings.
-
generalisation
¶ Setting determining if searches should be generalised or not.
-
heaviest_element
¶ Filter on heaviest element.
This setting tells the geometry analyser to ignore hits from CSD structures that have elements heavier than that for a specified atomic symbol.
The atomic symbol is case sensitive.
-
impose_upper_limits
¶ Whether there an upper limit imposed on generalised searches or not.
This setting tells the geometry analyser whether or not to limit the number of levels traversed for generalised searches. Occasionally the geometry analyser can take a very long time to identify similar fragments when performing a generalised search. Limiting the number of levels traversed will reduce the chances of this happening but may also result n fewer hits being found.
-
organometallic_filter
¶ Configure how organometallic and organic hits should be filtered.
This setting instructs the geometry analyser to ignore fragments depending on whether they are from organic or organometallic structures.
There are three possible options for this setting:
- ‘all’
- ‘metalorganics_only’
- ‘organics_only’
-
powder_filter
¶ Configure whether or not powder structures be filtered.
This setting instructs the geometry analyser to ignore if set to True or retain if set to False, fragments from powder study analyses.
-
rfactor_filter
¶ Filter on R-factor.
Note that there are only four possible settings for this option:
- 0.05
- 0.075
- 0.1
- any
However you can set the filter using any value and the appropriate filter will be selected. Note that if the value supplied is greater than 0.1 this means that the R-factor filter will be set to
None
. If you set the filter toNone
or ‘any’ the filter will also be set toNone
.
-
solvent_filter
¶ Configure how solvents and non-solvents should be filtered.
This setting instructs the geometry analyser to ignore fragments depending on whether they are from solvent or non-solvent molecules.
There are three possible options for this setting:
- ‘include_solvent’
- ‘exclude_solvent’
- ‘only_solvent’
-
class
-
analyse_angle
(a, b, c)[source]¶ Perform a geometry analysis on a single valence angle.
Params a: ccdc.molecule.Atom
Params b: ccdc.molecule.Atom
Params c: ccdc.molecule.Atom
Returns: ccdc.conformer.GeometryAnalyser.Analysis
Raises: TypeError if the atoms supplied do not make up a bonded angle
-
analyse_bond
(a, b)[source]¶ Perform a geometry analysis on a single bond.
Params a: ccdc.molecule.Atom
Params b: ccdc.molecule.Atom
Returns: ccdc.conformer.GeometryAnalyser.Analysis
Raises: TypeError if the atoms supplied do not form a covalent bond
-
analyse_molecule
(mol, _max_atoms_to_analyse=999)[source]¶ Perform a geometry analysis of the whole molecule.
Params mol: ccdc.molecule.Molecule
to be analysedReturns: ccdc.molecule.Molecule
augmented with analysis data
-
analyse_ring
(*ats)[source]¶ Perform a geometry analysis on a single ring.
Params *ats*ats: ccdc.molecule.Atom
instances that make up the ringReturns: ccdc.conformer.GeometryAnalyser.Analysis
Raises: TypeError if the atoms supplied do not make up a ring
-
analyse_torsion
(a, b, c, d)[source]¶ Perform a geometry analysis on a single torsion angle.
Params a: ccdc.molecule.Atom
Params b: ccdc.molecule.Atom
Params c: ccdc.molecule.Atom
Params d: ccdc.molecule.Atom
Returns: ccdc.conformer.GeometryAnalyser.Analysis
Raises: TypeError if the atoms supplied do not make up a bonded torsion
-
fragment_identifier
(fragment)[source]¶ The unique identifier of a particular type of fragment.
This is a string encoding the molecular environment of a fragment.
Parameters: fragment – an instance of ccdc.conformer.GeometryAnalyser.Analysis
Returns: a string of four numbers separated by colons.
-
class