Solubility Platform

Note

The Solubility Platform features are only available with a license that includes the Solubility feature.

Ingestion

We create a molecule object from which we can create a database entry object. We then update the database entry with experimental data.

Modules import

>>> from ccdc import entry
>>> from ccdc import io
>>> from ccdc import search

Creating a new entry

A database entry can be created with suitable input file format, for examples cif, mol, or mol2, or from a SMILES string.

>>> # Create a database entry from a SMILES string.
>>> database_entry = entry.Entry.from_string('CC(C)Cc1ccc(cc1)C(C)C(O)=O')

Updating the entry

Various attributes of the database entry created can be updated with new values

>>> # To set a unique in-house identifier
>>> database_entry.identifier = 'compound123'
>>> # The following might be deduced automatically from the input file
>>> # But they can also be updated explicitly if required
>>> database_entry.chemical_name = 'ibuprofen'
>>> database_entry.formula = 'C13 H18 O2'
>>> database_entry.polymorph = 'polymorph 1'

Adding thermodynamic fields

Values for thermodynamic fields can be added to the database entry, for example

>>> # Melting points and unit
>>> database_entry.input_melting_point_range = (175, 177, 'Deg.C')
>>> # Heat of fusion
>>> database_entry.heat_of_fusion = (276, None, 'J/g')
>>> database_entry.heat_of_fusion_notes = 'melted slowly in my hand'
>>> # Change in heat capacity
>>> database_entry.heat_capacity = (0.5, 'J/K')
>>> database_entry.heat_capacity_notes = "didn't burn too badly when picked up hot"

Adding solubility table

Solubility measurements for different temperature and solvents are recorded in a table

>>> # Parameters are solubility and temperature. Solubility can be a single number, range, or include qualifiers i.e. >
>>> measurement1 = entry.SolubilityMeasurement((0-0.1), 25)
>>> # Specify solvent name and percentage
>>> measurement1.add_solvent('water', 90)
>>> measurement1.add_solvent('ethanol', 10)
>>> # Do the same for another measurement
>>> measurement2 = entry.SolubilityMeasurement(1.1, 25, solubility_unit='?', temperature_unit='?', notes='post ingestion of measurement 1')
>>> measurement2.add_solvent('water', 100)
>>> # Now add to the database entry
>>> database_entry.solubility_data = [measurement1, measurement2]

Writing to database

You can create a new database or add the new entry to an existing database

>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
...     database_writer.write(database_entry)

Updating Records

>>> database_entry.chemical_name = 'Ibuprofen'
>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
...     database_writer.write(database_entry)

Deleting Records

>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
...     database_writer.remove('compound123')

Example Script

An example script (ingestion.py) is provided to demonstrate how ingestion takes place for solubility data in csv files. The script expects the input to be in the following format.

  • Folder of CIF, mol or mol2 files, with the identifier as the file name

  • Input csv file for the compound and thermodynamic data with column headings:

    • Identifier, Compound name, Formula, Polymorph, Melting Temperature min, Melting Temperature max, Melting Temperature Unit, Melting Temperature notes, Enthalpy of Fusion min, Enthalpy of Fusion Unit, Enthalpy of Fusion max, Enthalpy of Fusion notes, Heat Capacity, Heat Capacity Unit, Heat Capacity notes

  • Folder containing csv files of solubility data with column headings:

    • Identifier, Solubility, Solubility Unit, Temperature, Temperature Unit, solvent 1, % solvent 1, solvent 2, % solvent 2, solvent 3, % solvent 3, solvent 4, % solvent 4, Notes

Running the script will create a new database in the file path given by the OUTPUT_DATABASE variable (unless a database already exists in the file path in which case the script will attempt to update it with the new entries). Note the identifier of all entries in a database must be unique. So if you need to re-run the script to re-generate the database from scratch, you need to delete the current one first.

Example script usage:

usage: ingestion.py [-h] [-c INPUT_CSV] [-i INPUT_DIRECTORY]
                    [-s SOLUBILITY_DIRECTORY] [-o OUTPUT_DATABASE]

optional arguments:
  -h, --help            show this help message and exit
  -c INPUT_CSV          a csv file containing the list of compounds (default:
                        ...)
  -i INPUT_DIRECTORY    a directory containing the input mol/mol2/CIF files
                        (default: ...)
  -s SOLUBILITY_DIRECTORY
                        a directory containing csv files of solubility
                        measurement data (default: ...)
  -o OUTPUT_DATABASE    the output database (default: ...)

Searching for notes

Text numeric search can be used to find notes text in each of the notes fields.

>>> notes_search = search.TextNumericSearch()
>>> notes_search.add_heat_capacity_notes("burn")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']
>>> notes_search.clear()
>>> notes_search.add_heat_of_fusion_notes("melt")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']
>>> notes_search.clear()
>>> notes_search.add_solubility_notes("ingestion")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']

Detailed API Documentation

This is a list of class attributes in the ccdc.entry and ccdc.search modules of the ccdc package in the CSD Python API that are specially available for Solubility Platform project.

Entry

class ccdc.entry.Entry(_entry=None)[source]

A database entry.

property heat_capacity

Get or set the heat capacity.

Returns:

a two-item tuple containing the heat capacity value and units

When setting, pass a tuple with either a single value, or a value and a units string.

>>> from ccdc.entry import Entry
>>> e=Entry.from_string("OCO")
>>> e.heat_capacity=(54.55,)
>>> e.heat_capacity
(54.55, '')
>>> e.heat_capacity=(54.55,'J/K')
>>> e.heat_capacity
(54.55, 'J/K')
property heat_capacity_notes

Get or set the notes on the heat capacity

>>> from ccdc.entry import Entry
>>> e=Entry.from_string("OCO")
>>> e.heat_capacity_notes
''
>>> e.heat_capacity_notes='from Wikipedia, at 189.78K'
>>> e.heat_capacity_notes
'from Wikipedia, at 189.78K'
property heat_of_fusion

Get or set the heat of fusion

A tuple is required to set this. The first item is a value or lower bound for heat of fusion. The optional second item is an upper bound for heat of fusion, or None. The optional third item is units for the heat of fusion.

>>> from ccdc.entry import Entry
>>> e=Entry.from_string("OCO")
>>> e.heat_of_fusion = (9.019,)
>>> e.heat_of_fusion
(9.019, 9.019, '')
>>> e.heat_of_fusion = (9,9.1)
>>> e.heat_of_fusion
(9.0, 9.1, '')
>>> e.heat_of_fusion = (9,9.1,'KJ/mol')
>>> e.heat_of_fusion
(9.0, 9.1, 'KJ/mol')
property heat_of_fusion_notes

Get or set the notes of heat of fusion

property solubility_data

Get or set the solubility data, a list of ccdc.entry.SolubilityMeasurement

>>> from ccdc.entry import Entry, SolubilityMeasurement
>>> e=Entry.from_string('CC(C)Cc1ccc(cc1)C(C)C(O)=O')
>>> sol = SolubilityMeasurement('21-22', 25, 'mg/L', 'deg.C', 'from PubChem')
>>> sol.add_solvent('water', 100)
>>> e.solubility_data = [sol]
>>> e.solubility_data
[SolubilityMeasurement(21 - 22, 25.0, "mg/L", "deg.C", from PubChem)]
>>> e.solubility_data[0].solvents
(('water', 100.0),)
class ccdc.entry.SolubilityMeasurement(solubility, temperature, solubility_unit='mg/mL', temperature_unit='deg.C', notes='')[source]

A solubility measurement.

Parameters:
  • solubility – a solubilty value, range tuple or string, see ccdc.entry.SolubilityMeasurement.Solubility

  • temperature – a measurement temperature value.

  • solubility_unit – the solubility units, default mg/mL.

  • temperature_unit – the temperature units, default deg.C.

  • notes – notes for this measurement.

class Solubility(solubility)[source]

A solubility value range.

This can be created from a single value. Or from a tuple of (lower, upper) bound values. Or from a string describing the range such as “< 15” or “3 - 8”.

property max

The maximum solubility value, or None if there is no maximum value.

property min

The minimum solubility value, or None if there is no minimum value.

add_solvent(solvent, ratio)[source]

Add a solvent-ratio to the solubility measurement

Run this method for each solvent.

Parameters:
  • solvent – name of solvent

  • ratio – percentage of solvent

property notes

The solubility measurement notes

property solubility

The solubility value, a ccdc.entry.SolubilityMeasurement.Solubility

property solubility_unit

The solubility unit

property solvents

The list of solvents and percentages

Each item is returned as a tuple of (name, percentage).

property temperature

The temperature value

property temperature_unit

The temperature unit

Search classes

class ccdc.search.TextNumericSearch(settings=None)[source]

Class to define and run text/numeric searches in a crystal structure database.

It is possible to add one or more criterion for the query to match.

>>> text_numeric_query = TextNumericSearch()
>>> text_numeric_query.add_compound_name('aspirin')
>>> text_numeric_query.add_citation(year=[2011, 2013])
>>> for hit in text_numeric_query.search(max_hit_structures=3):
...     print(hit.identifier)
...
ACSALA19
ACSALA20
ACSALA21

A human-readable representation of the queries may be obtained: >>> print(’, ‘.join(q for q in text_numeric_query.queries)) Compound name aspirin anywhere , Journal year in range 2011-2013

add_heat_capacity_notes(heat_capacity_notes, mode='anywhere', ignore_non_alpha_num=False)[source]

Search for heat capacity notes.

add_heat_of_fusion_notes(heat_of_fusion_notes, mode='anywhere', ignore_non_alpha_num=False)[source]

Search for heat of fusion notes.

add_solubility_notes(solubility_notes, mode='anywhere', ignore_non_alpha_num=False)[source]

Search for solubility notes.