IO API

Introduction

Module for reading and writing of molecules, crystals and database entries.

There are three types of readers: MoleculeReader, CrystalReader and EntryReader. The latter is used to read in database entries. It can also be used to read sdf files with the entry’s attributes dictionary formatted as SD tags.

Retrieving database entries from the CSD:

# Creating a CSD entry reader, including any updates which may be present
csd_entry_reader = EntryReader('CSD')

# Similarly a set of in-house databases may be adjoined to the CSD by constructing readers over
# a list of files.

# Retrieve an entry based upon its index
first_csd_entry = csd_entry_reader[0]

# Access an entry/crystal/molecule based upon on its identifier
abebuf_entry = csd_entry_reader.entry('ABEBUF')
abebuf_crystal = csd_entry_reader.crystal('ABEBUF')
abebuf_molecule = csd_entry_reader.molecule('ABEBUF')

# Loop over all CSD entries
for entry in csd_entry_reader:
    print(entry.identifier)

# Loop over all the molecules
for mol in csd_entry_reader.molecules():
    print(mol.smiles)

Accessing molecules from a file:

# Creating a molecule reader
mol_reader = MoleculeReader('my_molecules.mol2')

# Retrieve a molecule based upon its index
first_molecule = mol_reader[0]

# Loop over all molecules
for mol in mol_reader:
    print(mol.smiles)

There are three types of writers: MoleculeWriter, CrystalWriter and EntryWriter. The latter can be used to write out sdf files with the entry’s attributes dictionary formatted as SD tags. The writers inherit functionality from the private base class _DatabaseWriter.

Using a MoleculeWriter to write out a molecule:

with MoleculeWriter('abebuf.mol2') as mol_writer:
    mol_writer.write(abebuf_molecule)

API

CSD location and version number

ccdc.io.csd_directory()[source]

Return the directory containing the CSD.

ccdc.io.csd_version()[source]

Return the version of the CSD in use.

Readers

class ccdc.io._DatabaseReader(fname, db='')[source]

Base class for database readers.

Readers are context managers, supporting the syntax:

with MoleculeReader(filename) as filehandle:
    for mol in filehandle:
        print(mol.smiles)
close()[source]

Close the database.

crystal(id)[source]

Random access to crystals.

Parameters:idccdc.crystal.Crystal.identifier
Returns:ccdc.crystal.Crystal
crystals()[source]

Generator for crystals in the database.

entries()[source]

Generator for entries in the database.

entry(id)[source]

Random access to entries.

Parameters:idccdc.entry.Entry.identifier
Returns:ccdc.entry.Entry
identifier(i)[source]

Random access to identifiers.

Parameters:i – int index
Returns:str identifier
journals

The list of journals held in a database.

molecule(id)[source]

Random access to molecules

Parameters:idccdc.molecule.Molecule.identifier
Returns:ccdc.molecule.Molecule
molecules()[source]

Generator for molecules of the database.

class ccdc.io.EntryReader[source]

Treat the database as a source of entries.

An EntryReader can instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.
  • A file name with an optional format argument. If the format argument is empty it uses the suffix of the file name to infer the file format.
  • A list of connection strings, to specify a pool.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the EntryReader instance, which means that the methods of _DatabaseReader are available from the EntryReader instance.

>>> csd_entry_reader = EntryReader('CSD')
>>> type(csd_entry_reader[0])
<class 'ccdc.entry.Entry'>
>>> print(csd_entry_reader.identifier(0))
AABHTZ
>>> aabhtz_entry = csd_entry_reader.entry('AABHTZ')
>>> print(aabhtz_entry.publication.authors)
P.-E.Werner
class ccdc.io.CrystalReader[source]

Treat the database as a source of crystals.

A CrystalReader can be instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.
  • A file name with an optional format argument. If the format argument is empty it uses the suffix of the file name to infer the file format.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the CrystalReader instance, which means that the methods of _DatabaseReader are available from the CrystalReader instance.

>>> csd_crystal_reader = CrystalReader('CSD')
>>> type(csd_crystal_reader[0])
<class 'ccdc.crystal.Crystal'>
>>> print(csd_crystal_reader.identifier(0))
AABHTZ
>>> aabhtz_crystal = csd_crystal_reader.crystal('AABHTZ')
>>> print(aabhtz_crystal.crystal_system)
triclinic
class ccdc.io.MoleculeReader[source]

Treat the database as a source of molecules.

A MoleculeReader can be instantiated using:
  • The explicit string ‘CSD’, which defaults to the CSD.
  • A file name with an optional format argument. If the format argument is empty it uses the suffix of the file name to infer the file format.

One of the supported file formats is ‘identifiers’ in which case the file is assumed to contain a new line separated list of refcodes from the CSD. The suffix of such a file may be ‘.gcd’.

During initialisation a _DatabaseReader is dynamically bound to the MoleculeReader instance, which means that the methods of _DatabaseReader are available from the MoleculeReader instance.

>>> csd_molecule_reader = MoleculeReader('CSD')
>>> type(csd_molecule_reader[0])
<class 'ccdc.molecule.Molecule'>
>>> print(csd_molecule_reader.identifier(0))
AABHTZ
>>> aabhtz_molecule = csd_molecule_reader.molecule('AABHTZ')
>>> print(aabhtz_molecule.smiles)
CC(=O)NN1C=NN=C1N(N=Cc1c(Cl)cccc1Cl)C(C)=O

Writers

class ccdc.io._DatabaseWriter(fname, append=False)[source]

Base class for database formats.

Parameters:
  • fname – The filename of the database to create or open.
  • append – Append to the database when True, rather than replace it.

Writers are context managers, supporting the syntax:

with MoleculeWriter('output.mol2', append=True) as filehandle:
    filehandle.write(mol)
close()[source]

Close the database.

remove()[source]

Remove the file if it exists.

write_crystal(c)[source]

Appends an entry to the database to be written out.

Parameters:cccdc.crystal.Crystal
write_entry(e)[source]

Appends an entry to the database to be written out.

Parameters:eccdc.entry.Entry
write_molecule(m)[source]

Appends a molecule to the database to be written out.

Parameters:mccdc.molecule.Molecule
class ccdc.io.EntryWriter[source]

Writes Database Entries by default.

write(e)[source]

Write the entry.

Parameters:eccdc.entry.Entry
class ccdc.io.CrystalWriter[source]

Writes crystals by default.

write(c)[source]

Write the crystal.

Parameters:cccdc.crystal.Crystal
class ccdc.io.MoleculeWriter[source]

Writes molecules by default.

write(m)[source]

Write the molecule.

Parameters:mccdc.molecule.Molecule