IO API

Introduction

Module for reading and writing of molecules, crystals and database entries.

There are three types of readers: MoleculeReader, CrystalReader and EntryReader. The latter is used to read in database entries. It can also be used to read sdf files with the entry’s attributes dictionary formatted as SD tags.

Retrieving database entries from the CSD:

# Creating a CSD entry reader, including any updates which may be present
csd_entry_reader = EntryReader('CSD')

# Similarly a set of in-house databases may be adjoined to the CSD by constructing readers over
# a list of files.

# Retrieve an entry based upon its index
first_csd_entry = csd_entry_reader[0]

# Access an entry/crystal/molecule based upon on its identifier
abebuf_entry = csd_entry_reader.entry('ABEBUF')
abebuf_crystal = csd_entry_reader.crystal('ABEBUF')
abebuf_molecule = csd_entry_reader.molecule('ABEBUF')

# Loop over all CSD entries
for entry in csd_entry_reader:
    print(entry.identifier)

# Loop over all the molecules
for mol in csd_entry_reader.molecules():
    print(mol.smiles)

Accessing molecules from a file:

# Creating a molecule reader
mol_reader = MoleculeReader('my_molecules.mol2')

# Retrieve a molecule based upon its index
first_molecule = mol_reader[0]

# Loop over all molecules
for mol in mol_reader:
    print(mol.smiles)

There are three types of writers: MoleculeWriter, CrystalWriter and EntryWriter. The latter can be used to write out sdf files with the entry’s attributes dictionary formatted as SD tags. The writers inherit functionality from the private base class _DatabaseWriter.

Using a MoleculeWriter to write out a molecule:

with MoleculeWriter('abebuf.mol2') as mol_writer:
    mol_writer.write(abebuf_molecule)

API

CSD location and version number

ccdc.io.csd_directory()[source]

Return the directory containing the CSD.

ccdc.io.csd_version()[source]

Return the version of the CSD in use.

Readers

class ccdc.io._DatabaseReader(fname, db='')[source]

Base class for database readers.

Readers are context managers, supporting the syntax:

with MoleculeReader(filename) as filehandle:
    for mol in filehandle:
        print(mol.smiles)
close()[source]

Close the database.

crystal(id)[source]

Random access to crystals.

Parameters:

idccdc.crystal.Crystal.identifier

Returns:

ccdc.crystal.Crystal

crystals()[source]

Generator for crystals in the database.

entries()[source]

Generator for entries in the database.

entry(id)[source]

Random access to entries.

Parameters:

idccdc.entry.Entry.identifier

Returns:

ccdc.entry.Entry

identifier(i)[source]

Random access to identifiers.

Parameters:

i – int index

Returns:

str identifier

property journals

The list of journals held in a database.

molecule(id)[source]

Random access to molecules

Parameters:

idccdc.molecule.Molecule.identifier

Returns:

ccdc.molecule.Molecule

molecules()[source]

Generator for molecules of the database.

class ccdc.io.EntryReader(filename='', db='', format='', subset='')[source]

Treat the database as a source of entries.

An EntryReader can instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If the format argument is empty it uses the suffix of the file name to infer the file format.

  • A list of connection strings, to specify a pool.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the EntryReader instance, which means that the methods of _DatabaseReader are available from the EntryReader instance.

>>> csd_entry_reader = EntryReader('CSD')
>>> type(csd_entry_reader[0])
<class 'ccdc.entry.Entry'>
>>> print(csd_entry_reader.identifier(0))
AABHTZ
>>> aabhtz_entry = csd_entry_reader.entry('AABHTZ')
>>> print(aabhtz_entry.publication.authors)
P.-E.Werner
class ccdc.io.CrystalReader(filename='', db='', format='', subset='')[source]

Treat the database as a source of crystals.

A CrystalReader can be instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If the format argument is empty it uses the suffix of the file name to infer the file format.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the CrystalReader instance, which means that the methods of _DatabaseReader are available from the CrystalReader instance.

>>> csd_crystal_reader = CrystalReader('CSD')
>>> type(csd_crystal_reader[0])
<class 'ccdc.crystal.Crystal'>
>>> print(csd_crystal_reader.identifier(0))
AABHTZ
>>> aabhtz_crystal = csd_crystal_reader.crystal('AABHTZ')
>>> print(aabhtz_crystal.crystal_system)
triclinic
class ccdc.io.MoleculeReader(filename='', db='', format='', subset='')[source]

Treat the database as a source of molecules.

A MoleculeReader can be instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘sqlite’, ‘csdsql’, ‘csdsqlx’, ‘sqlmol2’]. If the format argument is empty it uses the suffix of the file name to infer the file format.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the MoleculeReader instance, which means that the methods of _DatabaseReader are available from the MoleculeReader instance.

>>> csd_molecule_reader = MoleculeReader('CSD')
>>> type(csd_molecule_reader[0])
<class 'ccdc.molecule.Molecule'>
>>> print(csd_molecule_reader.identifier(0))
AABHTZ
>>> aabhtz_molecule = csd_molecule_reader.molecule('AABHTZ')
>>> print(aabhtz_molecule.smiles)
CC(=O)NN1C=NN=C1N(N=Cc1c(Cl)cccc1Cl)C(C)=O

Writers

class ccdc.io._DatabaseWriter(fname, append=False)[source]

Base class for database formats.

Parameters:
  • fname – The filename of the database to create or open.

  • append – Append to the database when True, rather than replace it.

Writers are context managers, supporting the syntax:

with MoleculeWriter('output.mol2', append=True) as filehandle:
    filehandle.write(mol)
close()[source]

Close the database.

remove(id)[source]

Remove an identifier or entry from the database.

write_crystal(c)[source]

Appends an entry to the database to be written out.

Parameters:

cccdc.crystal.Crystal

write_entry(e)[source]

Appends an entry to the database to be written out.

Parameters:

eccdc.entry.Entry

write_molecule(m)[source]

Appends a molecule to the database to be written out.

Parameters:

mccdc.molecule.Molecule

class ccdc.io.EntryWriter(fname, format='', append=False)[source]

Writes Database Entries by default.

An EntryWriter can instantiated using:

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’, ‘csdsqlx’]. If the format argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using the format argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.

  • An optional append argument which tells the writer to append rather than replace existing content.

remove(id)[source]

Remove an identifier or entry from the database.

Parameters:

id – str or ccdc.entry.Entry

write(e)[source]

Write the entry.

Parameters:

eccdc.entry.Entry

class ccdc.io.CrystalWriter(fname, format='', append=False)[source]

Writes crystals by default.

A CrystalWriter can instantiated using:

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’, ‘csdsqlx’]. If the format argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using the format argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.

  • An optional append argument which tells the writer to append rather than replace existing content.

write(c)[source]

Write the crystal.

Parameters:

cccdc.crystal.Crystal

class ccdc.io.MoleculeWriter(fname, format='', append=False)[source]

Writes molecules by default.

A MoleculeWriter can instantiated using:

  • A file name with an optional format argument from [‘sdf’, ‘mol’, ‘mol2’, ‘identifiers’, ‘cif’, ‘mmcif’, ‘res’, ‘pdb’, ‘csdsql’, ‘csdsqlx’]. If the format argument is empty it uses the suffix of the file name to infer the file format. When the suffix is “.cif” we recommend using the format argument to specify which of CIF or mmCIF format is required, otherwise the writer will select one based on the data written.

  • An optional append argument which tells the writer to append rather than replace existing content.

write(m)[source]

Write the molecule.

Parameters:

mccdc.molecule.Molecule

Subsets

class ccdc.io.Subsets[source]

This class provides a simple way to access pre-defined CSD subsets.

Example:

>>> mof_reader = EntryReader(subset=Subsets.MOF)

The returned reader object is the same as if the Reader class has been initialized with the associated GCD file directly.

Subsets available:
  • ADP

  • BEST_HYDROGENS

  • BEST_LOW_TEMP

  • BEST_RFACTOR

  • BEST_ROOM_TEMP

  • COVID19

  • DRUG

  • DRUG_SINGLE_COMPONENT

  • ELECTRON

  • HIGH_PRESSURE

  • HYDRATE

  • MOF

  • MOF_NO_DISORDER

  • MOF_1D

  • MOF_2D

  • MOF_3D

  • PESTICIDE

  • POLYMORPHIC

  • TEACHING

  • MINIMAL_DISORDER

  • SIGNIFICANT_DISORDER