IO API¶
Introduction¶
Module for reading and writing of molecules, crystals and database entries.
There are three types of readers: MoleculeReader
,
CrystalReader
and EntryReader
. The latter is used to read in
database entries. It can also be used to read sdf files with the entry’s attributes dictionary formatted as SD
tags.
Retrieving database entries from the CSD:
# Creating a CSD entry reader, including any updates which may be present
csd_entry_reader = EntryReader('CSD')
# Similarly a set of in-house databases may be adjoined to the CSD by constructing readers over
# a list of files.
# Retrieve an entry based upon its index
first_csd_entry = csd_entry_reader[0]
# Access an entry/crystal/molecule based upon on its identifier
abebuf_entry = csd_entry_reader.entry('ABEBUF')
abebuf_crystal = csd_entry_reader.crystal('ABEBUF')
abebuf_molecule = csd_entry_reader.molecule('ABEBUF')
# Loop over all CSD entries
for entry in csd_entry_reader:
print(entry.identifier)
# Loop over all the molecules
for mol in csd_entry_reader.molecules():
print(mol.smiles)
Accessing molecules from a file:
# Creating a molecule reader
mol_reader = MoleculeReader('my_molecules.mol2')
# Retrieve a molecule based upon its index
first_molecule = mol_reader[0]
# Loop over all molecules
for mol in mol_reader:
print(mol.smiles)
There are three types of writers: MoleculeWriter
,
CrystalWriter
and EntryWriter
. The latter can be used to
write out sdf files with the entry’s attributes dictionary formatted as SD
tags. The writers inherit functionality from the private base class
_DatabaseWriter
.
Using a MoleculeWriter
to write out a molecule:
with MoleculeWriter('abebuf.mol2') as mol_writer:
mol_writer.write(abebuf_molecule)
See also
API¶
CSD location and version number¶
Readers¶
- class ccdc.io._DatabaseReader(fname, db='')[source]¶
Base class for database readers.
Readers are context managers, supporting the syntax:
with MoleculeReader(filename) as filehandle: for mol in filehandle: print(mol.smiles)
- identifier(i)[source]¶
Random access to identifiers.
- Parameters
i – int index
- Returns
str identifier
- property journals¶
The list of journals held in a database.
- class ccdc.io.EntryReader(filename='', db='', format='', subset='')[source]¶
Treat the database as a source of entries.
- An
EntryReader
can instantiated using: The explicit string ‘CSD’, which defaults to the CSD.
A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format.A list of connection strings, to specify a pool.
One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.
During initialisation a
_DatabaseReader
is dynamically bound to theEntryReader
instance, which means that the methods of_DatabaseReader
are available from theEntryReader
instance.>>> csd_entry_reader = EntryReader('CSD') >>> type(csd_entry_reader[0]) <class 'ccdc.entry.Entry'> >>> print(csd_entry_reader.identifier(0)) AABHTZ >>> aabhtz_entry = csd_entry_reader.entry('AABHTZ') >>> print(aabhtz_entry.publication.authors) P.-E.Werner
- An
- class ccdc.io.CrystalReader(filename='', db='', format='', subset='')[source]¶
Treat the database as a source of crystals.
- A
CrystalReader
can be instantiated using: The explicit string ‘CSD’, which defaults to the CSD.
A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format.
One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.
During initialisation a
_DatabaseReader
is dynamically bound to theCrystalReader
instance, which means that the methods of_DatabaseReader
are available from theCrystalReader
instance.>>> csd_crystal_reader = CrystalReader('CSD') >>> type(csd_crystal_reader[0]) <class 'ccdc.crystal.Crystal'> >>> print(csd_crystal_reader.identifier(0)) AABHTZ >>> aabhtz_crystal = csd_crystal_reader.crystal('AABHTZ') >>> print(aabhtz_crystal.crystal_system) triclinic
- A
- class ccdc.io.MoleculeReader(filename='', db='', format='', subset='')[source]¶
Treat the database as a source of molecules.
- A
MoleculeReader
can be instantiated using: The explicit string ‘CSD’, which defaults to the CSD.
A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format.
One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.
During initialisation a
_DatabaseReader
is dynamically bound to theMoleculeReader
instance, which means that the methods of_DatabaseReader
are available from theMoleculeReader
instance.>>> csd_molecule_reader = MoleculeReader('CSD') >>> type(csd_molecule_reader[0]) <class 'ccdc.molecule.Molecule'> >>> print(csd_molecule_reader.identifier(0)) AABHTZ >>> aabhtz_molecule = csd_molecule_reader.molecule('AABHTZ') >>> print(aabhtz_molecule.smiles) CC(=O)NN1C=NN=C1N(N=Cc1c(Cl)cccc1Cl)C(C)=O
- A
Writers¶
- class ccdc.io._DatabaseWriter(fname, append=False)[source]¶
Base class for database formats.
- Parameters
fname – The filename of the database to create or open.
append – Append to the database when True, rather than replace it.
Writers are context managers, supporting the syntax:
with MoleculeWriter('output.mol2', append=True) as filehandle: filehandle.write(mol)
- write_entry(e)[source]¶
Appends an entry to the database to be written out.
- Parameters
e –
ccdc.entry.Entry
- class ccdc.io.EntryWriter(fname, format='', append=False)[source]¶
Writes Database Entries by default.
An
EntryWriter
can instantiated using:A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using theformat
argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.An optional
append
argument which tells the writer to append rather than replace existing content.
- remove(id)[source]¶
Remove an identifier or entry from the database.
- Parameters
id – str or
ccdc.entry.Entry
- write(e)[source]¶
Write the entry.
- Parameters
e –
ccdc.entry.Entry
- class ccdc.io.CrystalWriter(fname, format='', append=False)[source]¶
Writes crystals by default.
A
CrystalWriter
can instantiated using:A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using theformat
argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.An optional
append
argument which tells the writer to append rather than replace existing content.
- class ccdc.io.MoleculeWriter(fname, format='', append=False)[source]¶
Writes molecules by default.
A
MoleculeWriter
can instantiated using:A file name with an optional
format
argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’]. If theformat
argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using theformat
argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.An optional
append
argument which tells the writer to append rather than replace existing content.
Subsets¶
- class ccdc.io.Subsets[source]¶
This class provides a simple way to access pre-defined CSD subsets.
Example:
>>> mof_reader = EntryReader(subset=Subsets.MOF)
The returned reader object is the same as if the Reader class has been initialized with the associated GCD file directly.
- Subsets available:
ADP
BEST_HYDROGENS
BEST_LOW_TEMP
BEST_RFACTOR
BEST_ROOM_TEMP
COVID19
DRUG
DRUG_SINGLE_COMPONENT
ELECTRON
HIGH_PRESSURE
HYDRATE
MOF
MOF_NO_DISORDER
MOF_1D
MOF_2D
MOF_3D
PESTICIDE
POLYMORPHIC
TEACHING
MINIMAL_DISORDER
SIGNIFICANT_DISORDER