Working with Crystal Structure Predictions¶
Introduction¶
The main class of the ccdc.csp.prediction
module is ccdc.csp.prediction.Prediction
.
A ccdc.csp.prediction.Prediction
represents a predictied crystal structure, associated metadata
and computed properties. It will normally be provided by a ccdc.csp.database.CspDatabase
but
can also be loaded from a CIF file with CSP annotations.
Accessing the crystal structure¶
First we need to read the CSP database, to further access the structural information as well
as the CSP metadata and computed properties. Let us therefore import the ccdc.csp.database.CspDatabase
class
and read in the CSP database.
>> from ccdc.csp.database import CspDatabase
>> csp_db = CspDatabase("C:\\DATABASES\\CSP_database_01Feb2021.csdsqlx", "http://int-csp02.ccdc.cam.ac.uk:8277")
Now, we let’s access the first prediction, containing a predictied crystal structure, associated metadata and computed properties, of the first CSP landscape in the CSP database:
>> landscapes = csp_db.landscape_names
>> print(landscapes)
('Triptycene', 'XXIII_1', 'XXIII_2', 'Axitinib', 'XXIV')
>> first_landscape_name = landscapes[0]
>> print(first_landscape_name)
Triptycene
>> landscape_prediction_ids = csp_db.prediction_identifiers(first_landscape_name)
>> first_prediction_id = landscape_prediction_ids[0]
>> print(first_prediction_id)
triptycene_1
>> first_prediction = csp_db.prediction(first_prediction_id)
Or if we are interested in creating a list with all the predictions in the first landscape:
>> first_landscape_predictions = [csp_db.prediction(id) for id in csp_db.prediction_identifiers(first_landscape_name)]
The entry (ccdc.entry.Entry
) class, representing an entry of the CSD database it is contained in
ccdc.csp.prediction.Prediction.entry
.
Analogously, the crystallographic properties of a crystal (ccdc.crystal.Crystal
) can be accessed via
ccdc.csp.prediction.Prediction.crystal
.
Similarly, the chemistry of the three classes (ccdc.molecule.Atom
, ccdc.molecule.Bond
, and ccdc.molecule.Molecule
)
in the ccdc.molecule
are accessible via ccdc.csp.prediction.Prediction.molecule.atoms
, ccdc.csp.prediction.Prediction.molecule.bonds, and ccdc.csp.prediction.Prediction.molecule attributes, respectively.
So, for example, we can check which is the Z’ or space group of the triptycene_1 prediciton:
>> print(first_prediction.crystal.z_prime)
1.0
>> print(first_prediction.crystal.spacegroup_symbol)
P21/c
Accessing the computed properties¶
During the ingestion of the predicted crystal structures into the CSP database some properties (e.g. void volume in the unit cell or predicted BFDH morphology) are pre-computed and stored in the CSP database as part of the prediction. The ingested computed properties are:
The BFDH predicted morphology,
ccdc.csp.prediction.Prediction.BFDH_form
.
>> print(first_prediction.BFDH_form)
block
The packing coefficient of the crystal
ccdc.csp.prediction.Prediction.packing_coefficient
. Measures the proportion of the unit cell occupied by atoms. It is a fraction between zero and one; going from unoccupied to completely filled.The void volume and fraction of void volume of the crystal in the unit cell
ccdc.csp.prediction.Prediction.void_volume
andccdc.csp.prediction.Prediction.void_volume
>> packing_coefficient = first_prediction.packing_coefficient
>> print(packing_coefficient)
0.724
>> void_volume = first_prediction.void_volume
>> void_percentage = first_prediction.void_percent
>> print('The crystal structure has a void volume of {0} cubic Angstroms, representing {1} % of the unit cell volume'.format(void_volume, void_percentage))
The crystal structure has a void volume of 23.4 cubic Angstroms, representing 2.1 % of the unit cell volume
Average and maximmum difference of the molecular shape descriptor (
ccdc.csp.prediction.Prediction.molecular_shape_average
, andccdc.csp.prediction.Prediction.molecular_shape_difference
) of the heaviest component molecules in the asymmetric unit, in Angstroms. The molecular shape descriptor of a given molecule is the ratio between the long (laxis), medium (maxis) and small (saxis), axes (laxis * saxis / maxis) of the molecular bounding box.
>> print(first_prediction.molecular_shape_average)
14.6
>> print(first_prediction.molecular_shape_difference)
0.0
The list of hydrogen bond interactions,
ccdc.csp.prediction.Prediction.hydrogen_bonds
.
>> hbonds = first_prediction.hydrogen_bonds
>> print(len(hbonds))
2
>> for hb in hbonds:
... print('An {4} HB interaction, {0}...{1}, with a D-A distance and angle of {2} Ang. and {3} degrees.'.format(hb.donor, hb.acceptor, hb.distance, hb.angle, hb.molecule_relationship))
An intermolecular HB interaction, N3...N4, with a D-A distance and angle of 3.035 Ang. and 157.51 degrees.
An intermolecular HB interaction, N1...O1, with a D-A distance and angle of 2.871 Ang. and 167.63 degrees.
Accessing the CSP metadata fields¶
From a CIF file, with CSP annotations, CSP metadata fields are annotated into the prediction, as follows:
The
ccdc.csp.prediction.Prediction.simulation_temperature
attribute contains the temperature, in kelvins (K), at which the simulation (prediction) of the crystal structure was made. The permitted range is between 0.0 and infinity.The
ccdc.csp.prediction.Prediction.optimisation_energy_model
attribute contains the highest energy model used for crystal structure optimisation. The data value must be one of the following: (i) Force Field, (ii) Semi-empirical, (iii) DFT, (iv) Wavefunction, (v) ML/AI, (vi) Other. Additional description of each method is provided viaccdc.csp.prediction.Prediction.optimisation_energy_model_definition
attribute.The
ccdc.csp.prediction.Prediction.optimisation_energy_model_definition
attribute contains additional description of the energy model used for structure optimization. For example if ‘Force Field’ was the value ofccdc.csp.prediction.Prediction.optimisation_energy_model
this field could be used to capture information about the force field name, or the highest electrostatic multipole (L0, L1, L2, L3,…).The
ccdc.csp.prediction.Prediction.classification_energy_relative
attribute contains the relative lattice energy, in kJ/mol, of the predicted crystal structure with respect to the global minimum on the lattice or absolute energy landscape at the same simulation temperature,ccdc.csp.prediction.Prediction.simulation_temperature
.The
ccdc.csp.prediction.Prediction.free_energy_method
attribute contains the method of free energy correction including if atomic positions were relaxed (all-atoms) or not (rigid-molecule). The data value must be one of the following: (i) Harmonic rigid-molecule (or Harmonic all-atoms), (ii) Anharmonic rigid-molecule (or Anhamonic all-atoms), (iii) Quasi-harmonic rigid-molecule (or Quasi-harmonic all-atoms), (iv) Thermal averaging, (v) Zero-point energy rigid-molecule (or Zero-point energy all-atoms), (vi) Other. If the defined list of values is not currently appropriate contact us hello@ccdc.cam.ac.uk to help define a new value.The
ccdc.csp.prediction.Prediction.free_energy_relative
attribute contains the relative free energy, at a given temperature, of the structure with respect to the corresponding global minimum on the same free energy landscape. Units: kJ/mol.The
ccdc.csp.prediction.Prediction.matched_refcode
attribute contains the database code of an experimental structure that is the exact or partial structural match of the crystal structure on this prediction. The creator of a CIF will not normally specify this data item and it will be added by database provider (CCDC).The
ccdc.csp.prediction.Prediction.refcode_match
attribute contains ratio between the number of molecules matched and the total number of molecules in the overlay molecule cluster. A value of 1 indicates a perfect, N our of N, molecules match. Values lower than 1 indicate a partial match, M molecules out of N, being M < N. The creator of a CIF will not normally specify this data item and it will be added by database provider (CCDC).The
ccdc.csp.prediction.Prediction.refcode_match_rmsd
attribute contains the value of the root mean squared deviation (RMSD) of atomic positions (including hydrogen atoms), in Angstrom, for the default overlay of matching clusters of 20 molecules per chemically different component.
The universal defintion of CSP annotations in a CIF file as well as CSP metadata fields is a work in progress the CCDC is leading, via the CCDC CSPC consortium.
So, for example, we could interrogate which are the details of the method used to rank the energy of the predicted structures, the relative lattice energy and the temperature at which the first prediction was carried out:
>> print(first_prediction.optimisation_energy_model_definition)
Williams'98 Electrostatic multipoles L4
>> print(first_prediction.classification_energy_relative)
0.0
>> print(first_prediction.simulation_temperature)
0.0