Working with Crystal Structure Predictions


The main class of the ccdc.csp.prediction module is ccdc.csp.prediction.Prediction.

A ccdc.csp.prediction.Prediction represents a predictied crystal structure, associated metadata and computed properties. It will normally be provided by a ccdc.csp.database.CspDatabase but can also be loaded from a CIF file with CSP annotations.

Accessing the crystal structure

First we need to read the CSP database, to further access the structural information as well as the CSP metadata and computed properties. Let us therefore import the ccdc.csp.database.CspDatabase class and read in the CSP database.

>> from ccdc.csp.database import CspDatabase
>> csp_db = CspDatabase("C:\\DATABASES\\CSP_database_01Feb2021.csdsqlx", "")

Now, we let’s access the first prediction, containing a predictied crystal structure, associated metadata and computed properties, of the first CSP landscape in the CSP database:

>> landscapes = csp_db.landscape_names
>> print(landscapes)
('Triptycene', 'XXIII_1', 'XXIII_2', 'Axitinib', 'XXIV')
>> first_landscape_name = landscapes[0]
>> print(first_landscape_name)
>> landscape_prediction_ids = csp_db.prediction_identifiers(first_landscape_name)
>> first_prediction_id = landscape_prediction_ids[0]
>> print(first_prediction_id)
>> first_prediction = csp_db.prediction(first_prediction_id)

Or if we are interested in creating a list with all the predictions in the first landscape:

>> first_landscape_predictions = [csp_db.prediction(id) for id in csp_db.prediction_identifiers(first_landscape_name)]

The entry (ccdc.entry.Entry) class, representing an entry of the CSD database it is contained in ccdc.csp.prediction.Prediction.entry.

Analogously, the crystallographic properties of a crystal (ccdc.crystal.Crystal) can be accessed via ccdc.csp.prediction.Prediction.crystal.

Similarly, the chemistry of the three classes (ccdc.molecule.Atom, ccdc.molecule.Bond, and ccdc.molecule.Molecule) in the ccdc.molecule are accessible via ccdc.csp.prediction.Prediction.molecule.atoms, ccdc.csp.prediction.Prediction.molecule.bonds, and ccdc.csp.prediction.Prediction.molecule attributes, respectively.

So, for example, we can check which is the Z’ or space group of the triptycene_1 prediciton:

>> print(first_prediction.crystal.z_prime)
>> print(first_prediction.crystal.spacegroup_symbol)

Accessing the computed properties

During the ingestion of the predicted crystal structures into the CSP database some properties (e.g. void volume in the unit cell or predicted BFDH morphology) are pre-computed and stored in the CSP database as part of the prediction. The ingested computed properties are:

>> print(first_prediction.BFDH_form)
>> packing_coefficient = first_prediction.packing_coefficient
>> print(packing_coefficient)
>> void_volume = first_prediction.void_volume
>> void_percentage = first_prediction.void_percent
>> print('The crystal structure has a void volume of {0} cubic Angstroms, representing {1} % of the unit cell volume'.format(void_volume, void_percentage))
The crystal structure has a void volume of 23.4 cubic Angstroms, representing 2.1 % of the unit cell volume
>> print(first_prediction.molecular_shape_average)
>> print(first_prediction.molecular_shape_difference)
>> hbonds = first_prediction.hydrogen_bonds
>> print(len(hbonds))
>> for hb in hbonds:
...    print('An {4} HB interaction, {0}...{1}, with a D-A distance and angle of {2} Ang. and {3} degrees.'.format(hb.donor, hb.acceptor, hb.distance, hb.angle, hb.molecule_relationship))
An intermolecular HB interaction, N3...N4, with a D-A distance and angle of 3.035 Ang. and 157.51 degrees.
An intermolecular HB interaction, N1...O1, with a D-A distance and angle of 2.871 Ang. and 167.63 degrees.

Accessing the CSP metadata fields

From a CIF file, with CSP annotations, CSP metadata fields are annotated into the prediction, as follows:

The universal defintion of CSP annotations in a CIF file as well as CSP metadata fields is a work in progress the CCDC is leading, via the CCDC CSPC consortium.

So, for example, we could interrogate which are the details of the method used to rank the energy of the predicted structures, the relative lattice energy and the temperature at which the first prediction was carried out:

>> print(first_prediction.optimisation_energy_model_definition)
Williams'98 Electrostatic multipoles L4
>> print(first_prediction.classification_energy_relative)
>> print(first_prediction.simulation_temperature)