Descriptors API

Introduction

The ccdc.descriptors module contains classes for calculating descriptors.

The main classes in the ccdc.descriptors module are:

API

class ccdc.descriptors.MolecularDescriptors[source]

Namespace for descriptors of a molecular nature.

class AdjacencyMatrixDescriptorCalculator(molecule)[source]

Descriptor calculator for descriptors based on a molecule’s adjacency matrix.

self_returning_walk(k)[source]

Return the number of walks of length k that start and end at the same atom. See Handbook of Molecular Descriptors, page 384, “self-returning walk counts”.

Parameters:

k – the number of steps to walk.

Returns:

float

self_returning_walk_ln(k)[source]

Return the logarithm of the number of walks of length k that start and end at the same atom. See Handbook of Molecular Descriptors, page 384, “self-returning walk counts”.

Parameters:

k – the number of steps to walk.

Returns:

float

topological_charge_autocorrelation_index(k)[source]

Calculate the topological charge autocorrelation index See https://pubs.acs.org/doi/pdf/10.1021/ci00019a008

Parameters:

k – the topological distance to measure across

Returns:

float

class AtomDistanceSearch(molecule)[source]

More rapid searching for atoms within a certain distance of a point.

atoms_within_range(point, radius)[source]

A tuple of all atoms within the given radius of the given point.

class AtomPairDistanceDescriptorCalculator(molecule)[source]

Atom pair distance descriptor calculations.

Parameters:

molecule – a ccdc.molecule.Molecule instance.

element_pair_count(element_a, element_b, distance)[source]

Return a count of the number of times a pair of elements appear with a specified minimum path length. See Handbook of Molecular Descriptors, page 428, “substructure descriptors, atom pairs”.

Parameters:
  • element_a – str. the first element name.

  • element_b – str. the second element name.

  • distance – int. the number of bonds between atoms of the specified the elements.

Returns:

float

class ConnectivityIndices(molecule)[source]

Connectivitiy index descriptor calculations.

average_connectivity_index(m)[source]

Return the average connectivity index of mth order. See Handbook of Molecular Descriptors, page 85, “connectivity indices”.

Parameters:

m – the path length evaluated.

Returns:

float

connectivity_index(m)[source]

Return the connectivity index of mth order. See Handbook of Molecular Descriptors, page 85, “connectivity indices”.

Parameters:

m – the path length evaluated.

Returns:

float

class InChIGenerator(include_stereo=True, add_hydrogens=True)[source]

Generate InChI molecular descriptors.

The original source of the InChI generation tool is available at https://www.inchi-trust.org/

See Stephen R Heller, Igor Pletnev, Stephen Stein and Dmitrii Tchekhovskoi, J. Cheminformatics, 2015, 7:23 https://doi.org/10.1186/s13321-015-0068-4

Parameters:
  • include_stereo – configure the generator to include stereochemistry (True, default) or ignore stereochemistry (False)

  • add_hydrogens – configure the generator to add hydrogens (True, default) or not add hydrogens (False)

class InChI(inchi_internal)[source]

An InChI object with the following attributes:

Variables:
  • success – a boolean to indicate if the InChI generation was successful

  • inchi – the InChI string

  • key – the InChI key

  • errors – a tuple of InChI generation errors

  • warnings – a tuple of InChI generation warnings

property add_hydrogens

Whether the InChI generator should add missing hydrogens

generate(structure, include_stereo=None, add_hydrogens=None)[source]

Generate InChI.

Parameters:
  • structure – a ccdc.crystal.Crystal or ccdc.molecule.Molecule object

  • include_stereo – set to True or False to override generator’s setting

  • add_hydrogens – set to True or False to override generator’s setting

Returns:

a ccdc.descriptors.MolecularDescripts.InChIGenerator.InChI instance

Raises:

TypeError if the type of the input structure is invalid

property include_stereo

Whether stereo chemistry be considered.

class MaximumCommonSubstructure(settings=None)[source]

Identifies the maximum common substructure of two molecules.

class Settings[source]

Settings for the MCS calculation.

property check_bond_count

Whether the bond count of an atom be checked.

property check_bond_polymeric

Check whether the bond be polymeric.

property check_bond_type

Whether the bond type be checked.

property check_charge

Whether the atom charge be checked.

property check_element

Whether the element be checked.

property check_hydrogen_count

Whether the atom’s hydrogen count be checked.

property connected

Whether substructure should be connected.

Note that finding disconnected maximal substructures is a lot slower than finding connected.

property ignore_hydrogens

Whether the hydrogens be ignored.

search(mol1, mol2, only_edges=False, search_step_limit='unlimited')[source]

Calculate the maximum common substructure between two molecules.

Parameters:
  • mol2 (mol1,) – ccdc.molecule.Molecule instances.

  • only_edges – bool. The search will find a maximal common substructure matching only the edges.

  • search_step_limit – positive integer or ‘unlimited’. Controls the maximum number of steps the algorithm takes.

Returns:

a pair of tuples, giving matched ccdc.molecule.Atom and ccdc.molecule.Bond instances.

Raises:

ValueError with invalid input

Note: this function is computationally exponential, so will take a long time on large molecules.

class Overlay(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True, match_elements=True)[source]

Overlays two molecules

property atom_match

Returns pairs of atoms from mol1 and mol2 matched in the overlay

property max_distance

Returns the maximum distance between two equivalent atoms in the overlay (Angstroms)

property molecule

Returns input molecule mol2 transformed to overlay onto mol1

property rmsd

Returns RMSD between the two overlaid molecules

property rmsd_tanimoto

Returns Tanimoto RMSD between the two overlaid molecules

property transformation

Returns Molecule.Transformation object required to overlay mol2 over mol1

class PrincipleAxesAlignedBox(molecule)[source]

The bounding box of the molecule aligned on its principle axes.

The vectors of the box have lengths of the size of the box. The x_vector is the major axis of the molecule, the y_vector the minor axis and the z_vector the minimal axis of the molecule.

property aligned_molecule

The molecule aligned along its principle axes, with centre at its centre of geometry.

property volume

The volume of the box.

property x_vector

The vector of the major axis of the box.

property y_vector

The vector of the minor axis of the box.

property z_vector

The vector of the minimal axis of the box.

static atom_angle(a, b, c)[source]

Angle subtended by three arbitrary atoms.

Parameters:
Returns:

float - the angle in degrees or None if one of the atoms has no coordinates

static atom_centroid(*atoms)[source]

Define the centroid of the specified atoms.

static atom_distance(a, b)[source]

Distance between two atom irrespective their parent molecules.

Parameters:
Returns:

float or None if one of the atoms has no coordinates

static atom_plane(*atoms)[source]

Define a plane from the coordinates of the atoms.

Parameters:

atoms – there must be at least three ccdc.molecule.Atom in the arguments.

static atom_torsion_angle(a, b, c, d)[source]

Plane angle subtended by the triples abc and bcd.

Parameters:
Returns:

float - the angle in degrees or None if one of the atoms has no coordinates

static atom_vector(atom0, atom1)[source]

Define the vector from atom0 to atom1.

Parameters:
Returns:

GeometricDescriptors.Vector

Raises:

RuntimeError if either atom has no coordinates.

static bond_length(bond)[source]

The length of a bond.

Parameters:

bondccdc.molecule.Bond

Returns:

float, or None if an atom of the bond has no coordinates

static overlay(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True)[source]

Overlay mol2 on mol1. Deprecated and replaced with ccdc.MolecularDescriptors.Overlay

Parameters:
  • mol1 – a ccdc.molecule.Molecule instance

  • mol2 – a ccdc.molecule.Molecule instance

  • atoms – a list of pairs of atoms to use in the overlay, or None for all atoms to be used

  • invert – allow inversions in the overlay

  • rotate_torsions – allow torsional rotations when overlaying

  • with_symmetry – take account of symmetry when overlaying atoms

Returns:

a ccdc.molecule.Molecule instance which is a copy of mol2 overlaid on mol1

Note: if with_symmetry is true, and matching atoms are provided, then the matching atoms need to form a connected structure.

static overlay_rmsds_and_transformation(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True)[source]

Overlay mol2 on mol1 and return properties of the overlay. Deprecated and replaced with ccdc.MolecularDescriptors.Overlay

Parameters:
  • mol1 – a ccdc.molecule.Molecule instance

  • mol2 – a ccdc.molecule.Molecule instance

  • atoms – a list of pairs of atoms to use in the overlay, or None for all atoms to be used

  • invert – allow inversions in the overlay

  • rotate_torsions – allow torsional rotations when overlaying

  • with_symmetry – take account of symmetry when overlaying atoms

Returns:

a tuple containing a ccdc.molecule.Molecule instance which is a copy of mol2 overlaid on mol1 as entry 0, the rmsd as entry 1, the Tanimoto rmsd as entry 2 and the overlay transformation as entry 3

static point_group_analysis(mol)[source]

Return Schoenflies notation of the point group symmetry of a molecule.

The point group symmetry is returned as a tuple of:

  • order (e.g. 1)

  • symbol (e.g. ‘C1’)

  • description (e.g. ‘Objects in this point group have no symmetry.’)

Parameters:

molccdc.molecule.Molecule

Returns:

(int, str, str)

static ring_centroid(ring)[source]

The centroid of the ring’s atoms.

Parameters:

ringccdc.molecule.Molecule.Ring

static ring_plane(ring)[source]

The plane of the ring’s atoms.

Parameters:

ringccdc.molecule.Molecule.Ring

static rmsd(mol1, mol2, atoms=None, overlay=False, exclude_hydrogens=True, with_symmetry=True)[source]

Return the RMSD of two molecules.

Both molecules should have the same atoms if atoms is None.

Parameters:
  • atoms – a list of pairs ccdc.molecule.Atom or None

  • overlay – Whether to overlay the molecules before calculating RMSD

  • exclude_hydrogens – Whether all-atom or heavy atom RMSD should be calculated

  • with_symmetry – Whether to allow symmetrical matches

Returns:

float

class ccdc.descriptors.GeometricDescriptors[source]

A namespace to hold geometric classes and functions.

class Plane(vector, distance, _plane=None)[source]

A plane in 3D.

property distance

The distance from the origin of the plane.

static from_points(*points)[source]

Construct a RMS fitted plane from points.

property normal

The normal to the plane.

plane_angle(plane)[source]

The angle between the two planes.

plane_distance(plane)[source]

The shortest distance of the plane to another.

property plane_vector1

A vector in the plane, normal to the plane’s normal.

property plane_vector2

A vector in the plane, normal to both the plane’s normal and the plane’s plane_vector1.

point_distance(point)[source]

The distance of the point to the plane.

vector_angle(vector)[source]

The angle between the plane and the vector.

class Quaternion(_quaternion=None)[source]

A normalised quaternion suitable for expressing rotations in 3D space.

Quaternions are a convenient method for expressing a complex sequence of rotations

By default this constructs a unit quaternion

complex_conjugate()[source]

In-place converts this quaternion to its complex conjugate

copy()[source]

Create a copy of this quaternion

static from_dimensions(q0, q1, q2, q3)[source]

create from 4 real numbers. The quaternion will be normalised to unit length. :param q0,q1,q2,q3: the 4 dimensions of the axes 1,i,j and k :raises ValueError: with invalid input (e.g. if the length of the quaternion is 0)

static from_euler_angles(alpha, beta, gamma, unit='degrees')[source]

create a quaternion from a set of euler angles

static from_vector_and_angle(vector, angle, unit='degrees')[source]

Construct from a vector and an angle :param vector: a tuple or list of length 3 that represents a vector in 3D space :param angle: a double that represents an angle (in default in degrees.) :raise: ValueError for a bad input

invert()[source]

In-place inverts this quaternion

rotate(object_to_rotate, cell=None)[source]

Rotates a set of vectors in place by this quaternion

rotation_matrix()[source]

Returns a rotation matrix that the quaternion currently describes

square()[source]

In-place squares this quaternion

class Sphere(centre, radius)[source]

A sphere in 3D.

class Vector(x, y, z)[source]

A 3D vector.

cross(other)[source]

Cross product.

dot(other)[source]

Dot product.

static from_points(p0, p1)[source]

Construct the vector from p0 to p1.

Parameters:

p1 (p0,) – ccdc.molecule.Coordinates

property length

The length of the vector.

static point_angle(p0, p1, p2)[source]

The angle between three points.

static point_distance(p0, p1)[source]

The distance between two points.

static point_torsion_angle(p0, p1, p2, p3)[source]

The torsion angle between four points.

Note

The powder pattern, morphology, hydrogen-bond coordination and graph set features are available only to CSD-Materials and CSD-Enterprise users.

class ccdc.descriptors.CrystalDescriptors[source]

Namespace for crystallographic descriptors.

class GraphSetSearch(settings=None)[source]

Finds the graph sets of a crystal.

class GraphSet(_graph_set_atoms, _view)[source]

An individual graph set.

property degree

The degree of the graph set, i.e. the number of atoms involved.

property descriptor

The descriptor of the graph set.

property edge_labels

The edge labels of the graph set.

The labels are arbitrary letters identifying a unique hydrogen bond, separated by ‘>’ or ‘<’ indicating the donor-acceptor direction.

property hbonds

The hydrogen bonds of the graph set.

Returns:

a tuple of ccdc.crystal.Crystal.HBond instances.

property label_set

The set of hydrogen bond labels found in the graph set.

property nacceptors

The number of acceptors involved in the graph set.

property ndonors

The number of donors involved in the graph set.

property nmolecules

The number of molecules involved in the graph set.

property period

The period of the graph set, i.e the number of hydrogen bonds in the repeat unit.

If the type of the graph set is not a chain or a ring this will be -1

class Settings(hbond_criterion=None)[source]

Configurable settings for the graph set analyser.

property angle_tolerance

The tolerance of the HBond angle.

property distance_range

Allowable distance range for a HBond to be formed.

property intermolecular

Whether HBonds should be intermolecular, intramolecular, or any.

level = 2

deepest level to search. This is the number of different HBonds involved.

max_chain_size = 4

longest chain to search

max_discrete_chain_size = 4

longest discrete chain to search

max_ring_size = 6

largest ring to search

property path_length_range

The shortest and longest bond-path separation for intramolecular contacts.

property require_hydrogens

Whether Hydrogens are required for the HBond.

property vdw_corrected

Whether the distance range is Van der Waals corrected.

search(crystal)[source]

Find all graph sets for the crystal subject to the constraints of the settings.

Parameters:

crystalccdc.crystal.Crystal instance.

Returns:

a tuple of ccdc.descriptors.CrystalDescriptors.GraphSetSearch.GraphSet instances.

class HBondCoordination(settings=None, skip_telemetry=False)[source]

Calculate HBond coordination predictions.

The HBondCoordination class is available only to CSD-Materials and CSD-Enterprise users.

class Predictions(crystal, _analysis, _predictions)[source]

The predictions for HBonds coordinations.

class Observation(label, coordination_count, probability)
coordination_count

Alias for field number 1

count(value, /)

Return number of occurrences of value.

index(value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

label

Alias for field number 0

probability

Alias for field number 2

functional_groups_of_hbond(hbond)[source]

The functional group pertaining to a hydrogen-bonding atom.

property is_valid

Whether or not valid predictions were made.

property observed

The predicted probabilities of observed HBonds.

predictions_for_label(label, type='donor')[source]

All the predictions for the atom.

Returns:

a pair: observed hbond coordination number, dictionary with key, hbond coordination number, value, predicted probability.

to_csv(separator=',')[source]

Format the predictions suitable for a csv file.

Parameters:

separator – a separation string, or None.

Returns:

if separator is None a tuple of lists of components, otherwise a separated string of components.

class Settings[source]

Settings pertaining to the calculation of coordination predictions.

property coordination_models_path

The directory in which the coordination models may be found.

predict(crystal)[source]

Calculate HBond coordination likelihoods for the crystal.

Returns:

a ccdc.hbond_coordination.CrystalDescriptors.HBondCoordination.Predictions instance.

class HBondPropensities(settings=None)[source]

Calculates HBond propensities.

class FittingData(_fitting_data=None, identifiers=None, databases=None)[source]

The collection of entries used for the prediction.

class FittingDataEntry(_fitting_item)[source]

An individual entry with associated matching data.

property identifier

The identifier of the fitting data item.

advice_comment(functional_group=None)[source]

A string indicating whether or not there are enough data for propensity predictions.

Note: when first made the fitting data has not performed substructure matching, so results for particular groups will be inappropriately bad. Results will be valid after ccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.match_fitting_data() has been called.

static from_file(file_name, databases)[source]

Reads fitting data from a file.

nitems(functional_group=None)[source]

How many items there are representing the functional group.

write(file_name)[source]

Writes the fitting data to a file.

class FunctionalGroup(_model_group)[source]

A functional group capable of hydrogen bonding.

property identifier

The name of the functional group.

matches(molecule)[source]

The substructure search matches of the functional group.

class HBond(hbp, _outcome)[source]

A putative HBond in the propensity calculation.

Variables:
class HBondAcceptor(_analysis)[source]

A potental acceptor atom.

This class will be augmented with the evidence found during match_fitting_data().

property acceptor_atom_type

A string representation of the atom’s acceptor type.

property accessible_surface_area

The accessible surface area of the HBond atom.

property atom

The ccdc.molecule.Atom of the HBondAtom.

property functional_group_identifier

The identifier of the functional group for this atom.

property identifier

The full identifier of this atom.

property label

The label of the atom in the original structure.

property nlone_pairs

The number of lone pairs associated with this atom.

class HBondAtom(_analysis)[source]

Base class for HBondDonor and HBondAcceptor.

property accessible_surface_area

The accessible surface area of the HBond atom.

property atom

The ccdc.molecule.Atom of the HBondAtom.

property functional_group_identifier

The identifier of the functional group for this atom.

property identifier

The full identifier of this atom.

property label

The label of the atom in the original structure.

property nlone_pairs

The number of lone pairs associated with this atom.

class HBondDonor(_analysis)[source]

A potential donor atom.

This class will be augmented with the evidence found during match_fitting_data().

property accessible_surface_area

The accessible surface area of the HBond atom.

property atom

The ccdc.molecule.Atom of the HBondAtom.

property donor_atom_type

A string representation of the atom’s donor type.

property functional_group_identifier

The identifier of the functional group for this atom.

property identifier

The full identifier of this atom.

property label

The label of the atom in the original structure.

property nlone_pairs

The number of lone pairs associated with this atom.

class HBondGrouping(hbond_propensities, _outcome)[source]

A grouping of interactions between donors and acceptors representing a possible hbond network.

This represents a point in the chart of Mercury’s HBondPropensity wizard.

Variables:
class InterPropensity(hbp, _row)[source]

Predicted propensity for a single HBond.

property acceptor_component

The component number of the acceptor in the target structure.

property acceptor_label

The label of the acceptor atom.

property acceptor_rank

The rank number of the acceptor.

property bounds

The lower and upper bounds of the prediction.

property donor_component

The component number of the donor in the target structure.

property donor_label

The label of the donor atom.

property donor_rank

The rank number of the donor.

property hbond_count

The number of instances of the hbond observed in the target structure.

property is_acceptor_bifurcated

Whether the acceptor is bifurcated in the target structure.

property is_donor_bifurcated

Whether the donor is bifurcated in the target structure.

property is_intermolecular

Whether or not the predicted propensity is for an intermolecular HBond.

property is_observed

Whether the hbond is observed in the target structure.

property predictive_error

The error in the prediction.

property propensity

The predicted value.

property scores

The calculated values and statistics for the hbond prediction.

property uncertainty

The uncertainty in the prediction.

class IntraPropensity(hbp, _row)[source]

Predicted propensity for an intramolecular HBond.

property acceptor_component

The component number of the acceptor in the target structure.

property acceptor_label

The label of the acceptor atom.

property acceptor_rank

The rank number of the acceptor.

property bounds

The lower and upper bounds of the prediction.

property donor_component

The component number of the donor in the target structure.

property donor_label

The label of the donor atom.

property donor_rank

The rank number of the donor.

property hbond_count

The number of instances of the hbond observed in the target structure.

property is_acceptor_bifurcated

Whether the acceptor is bifurcated in the target structure.

property is_donor_bifurcated

Whether the donor is bifurcated in the target structure.

property is_intermolecular

Whether or not the predicted propensity is for an intermolecular HBond.

property is_observed

Whether the hbond is observed in the target structure.

property predictive_error

The error in the prediction.

property propensity

The predicted value.

property scores

The calculated values and statistics for the hbond prediction.

property uncertainty

The uncertainty in the prediction.

class Model(_model)[source]

The logistic regression model.

class Coefficient(_coefficient)[source]

A coefficient of the regression model.

property confidence_interval

The upper and lower bounds of the coefficient.

property estimate

The estimate of the coefficient.

property identifier

The identifier of the coefficient.

property is_baseline

Whether or not the coefficient is used for the baseline calculation.

property p_value

P-value of the coefficient.

property significance_code

A string representation of how significant the parameter is.

*’ for P-value < 0.01, ‘’ < 0.01. ‘*’ < 0.05 and ‘.’ < 0.1

property standard_error

Standard error of the coefficient.

property z_value

Z-value of the coefficient.

class Parameter(_crystal_structure_property)[source]

A named parameter of the regression.

calculate(donor, acceptor)[source]

The value of this property for the pair of atoms.

Parameters:
  • donorccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.HBondDonor instance.

  • acceptorccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.HBondAcceptor instance.

Returns:

float

property identifier

The identifier of the parameter.

property advice_comment

A string representing the quality of the discrimination based on the ROC.

property akaike_information_criterion

The Akaike Information Criterion (AIC) of the model.

property area_under_roc_curve

Area under the ROC curve.

property coefficients

The coefficients of the model.

property equation

The regression equation.

property log_likelihood

The log likelihood of the model.

property log_likelihood_test_p_value

The P-value of the log likelihood of the model.

property null_deviance

The null deviance of the model.

property null_deviance_degrees_of_freedom

The degrees of freedom of the null deviance of the model.

property residual_deviance

The residual deviance of the model.

property residual_deviance_degrees_of_freedom

The number of degrees of freedom of the residual deviance of the model.

class Propensity(hbp, _row)[source]

Base class for inter- and intra-molecular propensity predictions.

Variables:
property acceptor_component

The component number of the acceptor in the target structure.

property acceptor_label

The label of the acceptor atom.

property acceptor_rank

The rank number of the acceptor.

property bounds

The lower and upper bounds of the prediction.

property donor_component

The component number of the donor in the target structure.

property donor_label

The label of the donor atom.

property donor_rank

The rank number of the donor.

property hbond_count

The number of instances of the hbond observed in the target structure.

property is_acceptor_bifurcated

Whether the acceptor is bifurcated in the target structure.

property is_donor_bifurcated

Whether the donor is bifurcated in the target structure.

property is_observed

Whether the hbond is observed in the target structure.

property predictive_error

The error in the prediction.

property propensity

The predicted value.

property scores

The calculated values and statistics for the hbond prediction.

property uncertainty

The uncertainty in the prediction.

class Settings[source]

Pertaining to HBond propensity calculation.

property databases

The databases to be used for the prediction.

Note: the databases MUST be SQLite ASER databases for the moment.

property limit_identifier_list

A list of identifiers to limit the search

property working_directory

The working directory for the predictions.

analyse_fitting_data()[source]

Perform a hydrogen bond analysis of the fitting data.

calculate_propensities(crystal=None)[source]

Apply the regression equation to a crystal.

Parameters:

crystalccdc.crystal.Crystal instance or None. If None the target structure will be used.

property fitting_data

The fitting data.

generate_hbond_groupings(min_donor_prob=None, min_acceptor_prob=None)[source]

Generate all possible permutations of donors and acceptors to create all possible hbond groupings.

hbond_atoms(crystal=None)[source]

The HBondDonor and HBondAcceptor atoms of a crystal.

Parameters:

crystalccdc.crystal.Crystal instance, or None, in which case the HBondAtoms of the target will be returned.

Returns:

a pair of tuples of ccdc.descriptors.CrystalDescriptors.HBondPropensities.HBondDonor and ccdc.descriptors.CrystalDescriptors.HBondPropensities.HBondAcceptor.

make_fitting_data()[source]

Deprecated method. Please use match_fitting_data or use CrystalDescriptors.HBondPropensities.FittingData.from_file to limit the entries that are searched

returns an object that will cause all of the database entries to be searched

match_fitting_data(count=None, verbose=False)[source]

Reduces fitting data down such that each functional group has at least the specified number of examples.

perform_regression()[source]

Performs the logistic regression.

property propensities

The inter- and intra-propensities of the prediction.

set_target(crystal)[source]

Sets a single target for the propensity calculation.

Parameters:

crystal – a ccdc.crystal.Crystal instance.

show_fitting_data_counts(data=None)[source]

Shows the matched counts for each functional group.

target_hbond_grouping()[source]

Which of the hbond groupings is of the target structure.

class Morphology(crystal=None)[source]
class Facet(_facet, _perpendicular_distance, _miller_indices)

One of the facets of a morphology.

property area

The area of the polygon.

property centre_of_geometry

The centre of geometry of the facet.

property coordinates

The coordinates of the vertices of the facet.

property edges

The edges making up the facet.

property miller_indices

The Miller indices of the facet.

property perpendicular_distance

The perpendicular distance from the origin.

property plane

The plane of the facet.

This is a ccdc.descriptors.GeometricDescriptors.Plane instance.

class OrientedBoundingBox(morphology)

The bounding box of the morphology.

This box is not necessarily axis-aligned.

property corners

The eight points forming the corners of the bounding box.

property major_length

The length of the major axis of the bounding box.

property median_length

The length of the middle axis of the bounding box.

property minor_length

The length of the minor axis of the bounding box.

property volume

The volume of the bounding box.

property bounding_box

The bounding box of the morphology.

A pair of ccdc.molecule.Coordinates representing the minimum and maximum corners of the box.

property centre_of_geometry

The centroid of the morphology.

property facets

The facets making up the morphology.

static from_file(file_name)

Creates a Morphology instance from a CIF file.

The CIF file should be those written by this class or Mercury, which includes a scaling for each of the perpendicular distances.

static from_growth_rates(crystal, growth_rates)

Creates a morphology from an iterable of growth rates.

Parameters:
property oriented_bounding_box

The minimum volume box of the morphology.

This will not necessarily be aligned to the orthonormal cartesian axes.

relative_area(miller_indices)

The relative area of the facet.

This is what is usually called the Morphological Importance of a facet.

property scale_factor

The factor by which the morphology is scaled.

property volume

The volume of the morphology.

This is calculated stochastically, rather than analytically, so has some error.

write(file_name, keep_all_indices=False)

Write this morphology to CIF file.

class PoreAnalyser(crystal, settings=None)[source]

Calculates Pore Analysis. crystal is ccdc.crystal.Crystal

class Flags[source]

Flags for validlity of cached variables

property calculator_is_valid

grid spacing (A)

class Settings[source]

Settings for PoreAnalyser

property cutoff_distance

Cut-off distance (A)

property grid_spacing

grid spacing (A)

property he_probe_epsilon

UFF L-J epsilon/k for He probe (K)

property he_probe_sigma

UFF L-J sigma for He probe (A)

property n_probe_sigma

UFF L-J sigma for N probe (A)

property samples_per_atom

Sample size for surface area calculation

set_to_defaults()[source]

set to default values

property temperature

Temperature (K)

convert_a3_to_cm3_per_g(volume)[source]

Utility to convert cubic angstroms to cm^3 per g

property max_pore_diameter

Result: Max pore diameter (A)

property network_accessible_geometric_volume

Result: Network accessible geometric pore volume (A^3)

property network_accessible_helium_volume

Result: Network accessible He pore volume (A^3)

property network_accessible_surface_area

Result: network accessible surface area (A^2)

property network_accessible_surface_area_per_mass

Result: network accessible surface area per mass (m^2/g)

property network_accessible_surface_area_per_volume

Result: metwork accessible surface area per volume (m^2/cm^3)

property num_percolated_dimensions

Result: Number of percolated dimensions

property pore_limiting_diameter

Result: Pore limiting diameter (A)

property system_density

Result: density (g/cm^3)

property system_mass

Result: mass of unit cell (g/mol)

property system_volume

Result: volume of unit cell (A^3)

property total_geometric_volume

Result: geometric pore volume (A^3)

property total_helium_volume

Result: He pore volume (A^3)

property total_surface_area

Result: surface area (A^2)

property total_surface_area_per_mass

Result: surface area per mass (m^2/g)

property total_surface_area_per_volume

Result: surface area per volume (m^2/cm^3)

class PowderPattern(_pattern, _settings=None, _simulation=None, _crystal=None)[source]

Represents a powder pattern.

The powder pattern class is available only to CSD-Materials and

CSD-Enterprise users.

class PreferredOrientation(values=None, _function=None)[source]

A preferred orientation for PXRD simulation.

property h

The miller indices h value of the preferred orientation.

property k

The miller indices k value of the preferred orientation.

property l

The miller indices l value of the preferred orientation.

property r

The March-Dollase r value of the preferred orientation.

class Settings(_settings=None)[source]

Settings which may be set for a Powder simulation.

property deuterium_is_hydrogen

Whether to include treat deuterium as hydrogen :return: True or false

property fast_peak_shape: bool

Whether to use a fast but less accurate peak shape calculation during simulation

The peak shape will be applied using fast fourier transform convolution of the peak shape function. This is faster but less accurate than the default convolution method. The resulting peaks will tend to be wider.

Returns:

True or False

property full_width_at_half_maximum

The the full width at half height of peaks to use in simulation :return: float representing the full width at half height of peaks (in degrees)

property include_hydrogens

Whether to include hydrogens in the simulation :return: True or false

property march_dollase_preferred_orientation

Setting for march_dollase.

Returns:

a PXRDMatchOptimiser.Settings.PreferredOrientation or None

The default value is None. This can be set with a tuple of (h, k, l, r).

property second_wavelength

Set or get the secondary wavelength

Parameters:

value – float or pair of floats (wavelength and scale factor) or another Wavelength object or None to remove the secondary wavelength

Returns:

Secondary wavelength object (or None) for the simulation

property slit_type: str

The type of slit to be simulated

This may be ‘fixed’ (default) or ‘variable’.

Returns:

string representing type of slit to be simulated

property two_theta_maximum

Where to end the pattern simulation :return: float representing the maximum 2-theta (in degrees)

property two_theta_minimum

Where to start the pattern simulation :return: float representing the minimum 2-theta (in degrees)

property two_theta_step

The step-size used in the pattern simulation :return: float representing the step size (in degrees)

property wavelength

Set or get the primary wavelength

Parameters:

value – float or pair of floats (wavelength and scale factor) or another Wavelength object or None to reset to the default

Returns:

Primary wavelength object (or None) for the simulation

class TickMark(_tick, _crystal=None)[source]

A tick mark in a simulated powder pattern.

property is_systematically_absent

Whether this tick mark is systematically absent.

property miller_indices

The Miller indices of this tick mark.

property two_theta

Two theta value of this tick.

class Wavelength(wavelength=None, scale_factor=1.0, _wavelength=None)[source]

Represents a wavelength for powder studies.

Some standard wavelengths - these are floats, not ccdc.descriptors.CrystalDescriptors.PowderPattern.Wavelength

property scale_factor

The scale factor of this Wavelength.

property wavelength

The wavelength.

property esd

The array of esd values (Estimated Square Deviations).

static from_crystal(crystal, settings=None)[source]

Create a CrystalDescriptors.PowderPattern from a crystal.

Parameters:
static from_file(file_name, format=None, default_wavelength=None)[source]

Create a CrystalDescriptors.PowderPattern from a file.

format may take one of the following values: - 'xy': XY format (2 columns: 2theta and intensity) - 'xye': XYE format (3 columns: 2theta, intensity and ESD) - 'xrdml': Panalytical XRDML format If format is None, it will be deduced from the filename extension.

Parameters:
  • file_name – path to the file

  • format – string indicating the format to expect; if None will deduce from filename extension

  • default_wavelength – Default wavelength used if no wavelength found/parsed in file

static from_xrdml_file(file_name, default_wavelength=None)[source]

Create a CrystalDescriptors.PowderPattern from a Panalytical XRDML file.

See https://www.malvernpanalytical.com/en/products/category/software/x-ray-diffraction-software/data-collector for details of the XRDML format.

Parameters:
  • file_name – path to XRDML file

  • default_wavelength – Default wavelength used if no wavelength found/parsed in file

static from_xy_file(file_name, default_wavelength=None)[source]

Create a CrystalDescriptors.PowderPattern from an xy file.

Parameters:
  • file_name – path to xy file

  • default_wavelength – Default wavelength used if no wavelength found/parsed in file

static from_xye_file(file_name, default_wavelength=None)[source]

Create a CrystalDescriptors.PowderPattern from an xye file.

Parameters:
  • file_name – path to xye file

  • default_wavelength – Default wavelength used if no wavelength found/parsed in file

integral(start=0.0, end=180.0)[source]

The area under the curve.

Parameters:
  • start – float

  • end – float

Returns:

float

property intensity

The array of intensity values.

resetWavelength(new_wavelength=None)[source]

Reset the wavelength for an existing powder pattern

Parameters:

new_wavelength – New wavelength, if this is left blank then the wavelength is reset to 1.54056 Angstrom

similarity(other, width=2.0, use_esds=True)[source]

Measure of match between this pattern and another.

This uses the cross-correlations described in `R. de Gelder, R. Wehrens, J.A. Hageman (2001) <i>J. Comp. Chem.</i> <b>22</b>:273-289. https://doi.org/10.1002/1096-987X(200102)22:3%3C273::AID-JCC1001%3E3.0.CO;2-0`_

Parameters:
  • otherccdc.descriptors.CrystalDescriptors.PowderPattern

  • width – width (in degrees) of the base of the triangle weight function

  • use_esds – Whether to use the powder pattern estimates of standard deviation on the counts in the calculation as weightings

Returns:

float that represents the similarity of the two patterns

property tick_marks

The array of tick marks if this is a simulated powder pattern.

Returns:

list of ccdc.descriptors.CrystalDescriptors.PowderPattern.TickMark or None if this is not a simulated powder pattern.

property two_theta

The array of two_theta values.

write_raw_file(file_name)[source]

Write a Bruker .raw file.

Parameters:

file_name – output file name

write_xrdml_file(file_name)[source]

Write a Panalytical .xrdml format file.

Parameters:

file_name – output file name

write_xy_file(file_name)[source]

Write a .xy format file.

The .xy format is the same as the .xye format except that the ESD column is omitted.

Parameters:

file_name – output file name

write_xye_file(file_name)[source]

Write a .xye format file.

Parameters:

file_name – output file name

class ccdc.descriptors.StatisticalDescriptors[source]

A namespace holding statistical descriptors.

class RankStatistics(scores, activity_column=None)[source]

Represents a ranked collection of values for which statistics can be derived.

ACC(fraction=0.0)[source]

Calculate accuracy metric (ACC) at the specified fraction.

ACC = (TP+TN) / (TP+FP+TN+FN)

Parameters:

fraction – position within data for which accuracy metric is to be determined.

Raises:

ValueError if fraction is not within interval [0,1]

AUC()[source]

Calculate the area under the ROC curve.

Returns:

Area under the ROC curve.

BEDROC(alpha=0.0)[source]

Calculate Boltzmann-Enhanced Discrimination of ROC (BEDROC) as defined in:

Truchon J., Bayly C.I., “Evaluating Virtual Screening Methods: Good and Bad Metric for the “Early Recognition” Problem” J. Chem. Inf. Model. 47:488-508 (2007).

Parameters:

alpha – exponential weighting factor.

Raises:

ValueError if alpha is less than or equal to 0.0.

EF(fraction=0.0)[source]

Calculate enrichment factor (EF) at the specified fraction.

Parameters:

fraction – position within data for which enrichment factor is to be determined.

Raises:

ValueError if fraction is not within interval [0,1]

PPV(fraction=0.0)[source]

Calculate precision or positive predictive value (PPV) at the specified fraction.

Parameters:

fraction – position within data for which precision is to be determined.

Raises:

ValueError if fraction is not within interval [0,1]

RIE(alpha=0.0)[source]

Calculate robust initial enhancement (RIE) as defined in:

Sheridan R.P., Singh S.B., Fluder E.M., Kearsley S.K., “Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches” J. Chem. Inf. Comp. Sci. 41:1395-1406 (2001).

Parameters:

alpha – exponential weighting factor

Raises:

ValueError if alpha is less than or equal to 0.0

ROC()[source]

Calculate receiver operating characteristic (ROC) curve.

Returns:

list, list - True positive rate, False positive rate

property activity_column

Get extractor for active/inactive classification from scores data.