Working with Crystal Structure Prediction Databases

Introduction

The main class of the ccdc.csp.database module is ccdc.csp.database.CspDatabase.

A ccdc.csp.database.CspDatabase represents a database of crystal structure predictions. The database can hold predictions for many different molecules, with sets of predictions for a particular molecule being organised into landscapes.

Accessing the CSP database

A CSP database actually consists of two distinct parts with different connection methods, this is a temporary situation which will be replaced with a single database in the future. The two databases are identified as follows:

  1. A crystal structure database file in csdsqlx format, “my_predictions.csdsqlx”, which must be visible in the file system on which a CSD-Theory python script is run.

  2. A CSP metadata database, “https://my.prediction.metadata.server:2468”, which is accessed via a web-service URL. This web-service will usually be hosted on a non-public web server that is visible to CSP users. The web-service URL will normally consist of a hostname and port number, optionally there may be some extra path.

First of all we need to read the CSP database, to further access the structural information as well as the CSP metadata and computed properties. Let us therefore import the ccdc.csp.database.CspDatabase class read in the CSP database and retrieve help of the ccdc.csp.database.CspDatabase class.

>> from ccdc.csp.database import CspDatabase
>> csp_db = CspDatabase("C:\\DATABASES\\CSP_database_01Feb2021.csdsqlx", "http://int-csp02.ccdc.cam.ac.uk:8277")

Now let’s look at the number of landscapes in the CSP database and the identifier of each CSP landscape.

>> landscapes = csp_db.landscape_names
>> print('There are {0} landscapes in the CSP database:'.format(len(landscapes)))
There are 5 landscapes in the CSP database:
>> for x in range (len(landscapes)):
..     print('Identifier of landscape number {0}: {1}'.format(x+1, landscapes[x]))
Identifier of landscape number 1: Axitinib
Identifier of landscape number 2: Triptycene
Identifier of landscape number 3: XXIII_L1
Identifier of landscape number 4: XXIV
Identifier of landscape number 5: XXV

We could also retrieve the number of predictions contained on a given landscape, e.g. Axitinib.

>> landscape_id = 'Axitinib'
>> axitinib_identifiers = csp_db.prediction_identifiers(landscape_id)
>> print('There is a total of {0} predictions in {1} CSP landscape'.format(len(axitinib_identifiers), landscape_id))
There is a total of 250 predictions in Axitinib CSP landscape

And access the prediction identifiers. For example, to print the first identifier of the predictions in the ‘Axitinib’ landscape

>> first_id = axitinib_identifiers[0]
>> print('The first identifier in the {0} landscape is: {1}'.format(landscape_id, first_prediction))
The first identifier in the Axitinib landscape is: axitinib_00001