Solubility Platform¶
Note
The Solubility Platform features are only available with a license that includes the Solubility feature.
Ingestion¶
We create a molecule object from which we can create a database entry object. We then update the database entry with experimental data.
Modules import¶
>>> from ccdc import entry
>>> from ccdc import io
>>> from ccdc import search
Creating a new entry¶
A database entry can be created with suitable input file format, for examples cif, mol, or mol2, or from a SMILES string.
>>> # Create a database entry from a SMILES string.
>>> database_entry = entry.Entry.from_string('CC(C)Cc1ccc(cc1)C(C)C(O)=O')
Updating the entry¶
Various attributes of the database entry created can be updated with new values
>>> # To set a unique in-house identifier
>>> database_entry.identifier = 'compound123'
>>> # The following might be deduced automatically from the input file
>>> # But they can also be updated explicitly if required
>>> database_entry.chemical_name = 'ibuprofen'
>>> database_entry.formula = 'C13 H18 O2'
>>> database_entry.polymorph = 'polymorph 1'
Adding thermodynamic fields¶
Values for thermodynamic fields can be added to the database entry, for example
>>> # Melting points and unit
>>> database_entry.input_melting_point_range = (175, 177, 'Deg.C')
>>> # Heat of fusion
>>> database_entry.heat_of_fusion = (276, None, 'J/g')
>>> database_entry.heat_of_fusion_notes = 'melted slowly in my hand'
>>> # Change in heat capacity
>>> database_entry.heat_capacity = (0.5, 'J/K')
>>> database_entry.heat_capacity_notes = "didn't burn too badly when picked up hot"
Adding solubility table¶
Solubility measurements for different temperature and solvents are recorded in a table
>>> # Parameters are solubility and temperature. Solubility can be a single number, range, or include qualifiers i.e. >
>>> measurement1 = entry.SolubilityMeasurement((0-0.1), 25)
>>> # Specify solvent name and percentage
>>> measurement1.add_solvent('water', 90)
>>> measurement1.add_solvent('ethanol', 10)
>>> # Do the same for another measurement
>>> measurement2 = entry.SolubilityMeasurement(1.1, 25, solubility_unit='?', temperature_unit='?', notes='post ingestion of measurement 1')
>>> measurement2.add_solvent('water', 100)
>>> # Now add to the database entry
>>> database_entry.solubility_data = [measurement1, measurement2]
Writing to database¶
You can create a new database or add the new entry to an existing database
>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
... database_writer.write(database_entry)
Updating Records¶
>>> database_entry.chemical_name = 'Ibuprofen'
>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
... database_writer.write(database_entry)
Deleting Records¶
>>> with io.EntryWriter('database.csdsql', append=True) as database_writer:
... database_writer.remove('compound123')
Example Script¶
An example script (ingestion.py) is provided to demonstrate how ingestion takes place for solubility data in csv files. The script expects the input to be in the following format.
Folder of CIF, mol or mol2 files, with the identifier as the file name
Input csv file for the compound and thermodynamic data with column headings:
Identifier, Compound name, Formula, Polymorph, Melting Temperature min, Melting Temperature max, Melting Temperature Unit, Melting Temperature notes, Enthalpy of Fusion min, Enthalpy of Fusion Unit, Enthalpy of Fusion max, Enthalpy of Fusion notes, Heat Capacity, Heat Capacity Unit, Heat Capacity notes
Folder containing csv files of solubility data with column headings:
Identifier, Solubility, Solubility Unit, Temperature, Temperature Unit, solvent 1, % solvent 1, solvent 2, % solvent 2, solvent 3, % solvent 3, solvent 4, % solvent 4, Notes
Running the script will create a new database in the file path given by the OUTPUT_DATABASE variable (unless a database already exists in the file path in which case the script will attempt to update it with the new entries). Note the identifier of all entries in a database must be unique. So if you need to re-run the script to re-generate the database from scratch, you need to delete the current one first.
Example script usage:
usage: ingestion.py [-h] [-c INPUT_CSV] [-i INPUT_DIRECTORY]
[-s SOLUBILITY_DIRECTORY] [-o OUTPUT_DATABASE]
optional arguments:
-h, --help show this help message and exit
-c INPUT_CSV a csv file containing the list of compounds (default:
...)
-i INPUT_DIRECTORY a directory containing the input mol/mol2/CIF files
(default: ...)
-s SOLUBILITY_DIRECTORY
a directory containing csv files of solubility
measurement data (default: ...)
-o OUTPUT_DATABASE the output database (default: ...)
Searching for notes¶
Text numeric search can be used to find notes text in each of the notes fields.
>>> notes_search = search.TextNumericSearch()
>>> notes_search.add_heat_capacity_notes("burn")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']
>>> notes_search.clear()
>>> notes_search.add_heat_of_fusion_notes("melt")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']
>>> notes_search.clear()
>>> notes_search.add_solubility_notes("ingestion")
>>> [hit.identifier for hit in notes_search.search("database.csdsql")]
['compound123']
Detailed API Documentation¶
This is a list of class attributes in the ccdc.entry
and ccdc.search
modules of the ccdc package in the CSD Python API that are specially available for Solubility Platform project.
Entry¶
- class ccdc.entry.Entry(_entry=None)[source]
A database entry.
- property heat_capacity
Get or set the heat capacity.
- Returns:
a two-item tuple containing the heat capacity value and units
When setting, pass a tuple with either a single value, or a value and a units string.
>>> from ccdc.entry import Entry >>> e=Entry.from_string("OCO") >>> e.heat_capacity=(54.55,) >>> e.heat_capacity (54.55, '') >>> e.heat_capacity=(54.55,'J/K') >>> e.heat_capacity (54.55, 'J/K')
- property heat_capacity_notes
Get or set the notes on the heat capacity
>>> from ccdc.entry import Entry >>> e=Entry.from_string("OCO") >>> e.heat_capacity_notes '' >>> e.heat_capacity_notes='from Wikipedia, at 189.78K' >>> e.heat_capacity_notes 'from Wikipedia, at 189.78K'
- property heat_of_fusion
Get or set the heat of fusion
A tuple is required to set this. The first item is a value or lower bound for heat of fusion. The optional second item is an upper bound for heat of fusion, or None. The optional third item is units for the heat of fusion.
>>> from ccdc.entry import Entry >>> e=Entry.from_string("OCO") >>> e.heat_of_fusion = (9.019,) >>> e.heat_of_fusion (9.019, 9.019, '') >>> e.heat_of_fusion = (9,9.1) >>> e.heat_of_fusion (9.0, 9.1, '') >>> e.heat_of_fusion = (9,9.1,'KJ/mol') >>> e.heat_of_fusion (9.0, 9.1, 'KJ/mol')
- property heat_of_fusion_notes
Get or set the notes of heat of fusion
- property solubility_data
Get or set the solubility data, a list of
ccdc.entry.SolubilityMeasurement
>>> from ccdc.entry import Entry, SolubilityMeasurement >>> e=Entry.from_string('CC(C)Cc1ccc(cc1)C(C)C(O)=O') >>> sol = SolubilityMeasurement('21-22', 25, 'mg/L', 'deg.C', 'from PubChem') >>> sol.add_solvent('water', 100) >>> e.solubility_data = [sol] >>> e.solubility_data [SolubilityMeasurement(21 - 22, 25.0, "mg/L", "deg.C", from PubChem)] >>> e.solubility_data[0].solvents (('water', 100.0),)
- class ccdc.entry.SolubilityMeasurement(solubility, temperature, solubility_unit='mg/mL', temperature_unit='deg.C', notes='')[source]
A solubility measurement.
- Parameters:
solubility – a solubilty value, range tuple or string, see
ccdc.entry.SolubilityMeasurement.Solubility
temperature – a measurement temperature value.
solubility_unit – the solubility units, default mg/mL.
temperature_unit – the temperature units, default deg.C.
notes – notes for this measurement.
- class Solubility(solubility)[source]
A solubility value range.
This can be created from a single value. Or from a tuple of (lower, upper) bound values. Or from a string describing the range such as “< 15” or “3 - 8”.
- property max
The maximum solubility value, or None if there is no maximum value.
- property min
The minimum solubility value, or None if there is no minimum value.
- add_solvent(solvent, ratio)[source]
Add a solvent-ratio to the solubility measurement
Run this method for each solvent.
- Parameters:
solvent – name of solvent
ratio – percentage of solvent
- property notes
The solubility measurement notes
- property solubility
The solubility value, a
ccdc.entry.SolubilityMeasurement.Solubility
- property solubility_unit
The solubility unit
- property solvents
The list of solvents and percentages
Each item is returned as a tuple of (name, percentage).
- property temperature
The temperature value
- property temperature_unit
The temperature unit
Search classes¶
- class ccdc.search.TextNumericSearch(settings=None)[source]
Class to define and run text/numeric searches in a crystal structure database.
It is possible to add one or more criterion for the query to match.
>>> text_numeric_query = TextNumericSearch() >>> text_numeric_query.add_compound_name('aspirin') >>> text_numeric_query.add_citation(year=[2011, 2013]) >>> for hit in text_numeric_query.search(max_hit_structures=3): ... print(hit.identifier) ... ACSALA19 ACSALA20 ACSALA21
A human-readable representation of the queries may be obtained: >>> print(’, ‘.join(q for q in text_numeric_query.queries)) Compound name aspirin anywhere , Journal year in range 2011-2013
- add_heat_capacity_notes(heat_capacity_notes, mode='anywhere', ignore_non_alpha_num=False)[source]
Search for heat capacity notes.
- add_heat_of_fusion_notes(heat_of_fusion_notes, mode='anywhere', ignore_non_alpha_num=False)[source]
Search for heat of fusion notes.
- add_solubility_notes(solubility_notes, mode='anywhere', ignore_non_alpha_num=False)[source]
Search for solubility notes.