Working with entries

Introduction

The ccdc.entry module contains the ccdc.entry.Entry class which represents a database entry.

Typically a database entry will be read from the CSD. Let us therefore import the ccdc.io module and read in the first entry from the CSD.

>>> from ccdc import io
>>> csd_reader = io.EntryReader('CSD')
>>> first_csd_entry = csd_reader[0]

Accessing entry properties

Let us have a look at the crystallographic properties available to us from a CSD crystal structure.

First of all it is worth noting that one can get access to the underlying ccdc.crystal.Crystal and ccdc.molecule.Molecule from the entry.

>>> mol = first_csd_entry.crystal.molecule
>>> print(mol.identifier)
AABHTZ

Some entries in the CSD exhibit disorder. By default the ccdc.entry.Entry.molecule will suppress disordered atoms. For example, ‘ABABUB’ in the CSD is such an entry:

>>> ababub = csd_reader.entry('ABABUB')
>>> mol = ababub.molecule
>>> print(len(mol.atoms))
30
>>> print(mol.formula)
C12 H15 N1 O2

A convention within the CSD is that disordered atoms have labels ending with a ‘?’. The full molecule, with suppressed atoms, can be retrieved:

>>> disordered_mol = ababub.disordered_molecule
>>> print(len(disordered_mol.atoms))
42
>>> print(disordered_mol.formula)
C12 H15 N1 O2

For entries that have been accessed from the CSD, one can also access the chemical name and formula of the crystal content.

>>> print(first_csd_entry.chemical_name)
4-Acetoamido-3-(1-acetyl-2-(2,6-dichlorobenzylidene)hydrazine)-1,2,4-triazole
>>> print(first_csd_entry.formula)
C13 H12 Cl2 N6 O2

Let us illustrate some more properties available for CSD entries using a crystal structure of ibuprofen.

>>> ibuprofen = csd_reader.entry('IBPRAC')
>>> print(ibuprofen.has_3d_structure)
True
>>> print(ibuprofen.has_disorder)
False
>>> print(ibuprofen.is_organometallic)
False
>>> print(ibuprofen.is_polymeric)
False
>>> print(ibuprofen.bioactivity)
analgesic and antiinflammatory agent
>>> print('\n'.join(ibuprofen.synonyms))
Ibuprofen
Advil
Motrin
Nurofen
DrugBank: DB01050
PDB Chemical Component code: IZP

This particular CSD entry does not have a publication DOI.

>>> print(ibuprofen.publication.doi)
None

However where it is defined it will be returned.

>>> print(csd_reader.entry('ABEBUF').publication.doi)
10.1021/cg049957u
>>> print(csd_reader.entry('ABEBUF').publication)  
Citation(authors='S.W.Gordon-Wylie, E.Teplin, J.C.Morris, M.I.Trombley,
                   S.M.McCarthy, W.M.Cleaver, G.R.Clark',
         journal='Journal(Crystal Growth and Design)', volume='4', year=2004,
         first_page='789', doi='10.1021/cg049957u')

A publication is returned as a named tuple, whose members may be retrieved by name or index, containing authors, journal_name, volume, year, number of first page and the publication’s doi where present.

>>> print(ibuprofen.publication)  
Citation(authors='J.F.McConnell',
         journal='Journal(Crystal Structure Communications)', volume='3', year=1974,
         first_page='73', doi=None)
>>> print(ibuprofen.publication.authors)
J.F.McConnell

From a CSD entry it is also possible to get hold of information on the colour, melting point, polymorphic form description, any disorder and the radiation source of the crystal’s determination or deposition date when this information is available in the underlying CSD entry.

>>> print(ibuprofen.color)
None
>>> print(ibuprofen.melting_point)
None
>>> print(ibuprofen.polymorph)
polymorph 1
>>> print(ibuprofen.disorder_details)
None
>>> print(csd_reader.entry('ABABUB').disorder_details)
The cyclohexene ring is disordered over two sites with occupancies 0.5878:0.4122.
>>> print(csd_reader.entry('ABINOR01').radiation_source)
Neutron
>>> print(ibuprofen.deposition_date)
1974-06-21

The entry for ibuprofen has no editorial comments; where they are present they may represent editorial decisions made during structural curation, or patent information:

>>> print(ibuprofen.remarks)
None
>>> print(csd_reader.entry('ABAPCU').remarks)
The position of the hydrate is dubious. It has been deleted
>>> print(csd_reader.entry('ARISOK').remarks)
U.S. Patent: US 6858644 B2

An indication of whether an experiment was done at pressure is available. This is given as a string and the units have not yet been normalised. Where this is None, the experiment was performed at ambient pressure:

>>> print(ibuprofen.pressure)
None
>>> print(csd_reader.entry('ABULIT03').pressure)
1.4 GPa

An boolean property will indicate whether or not the crystallographic determination was performed in a powder study:

>>> print(ibuprofen.is_powder_study)
False
>>> print(csd_reader.entry('ACATAA').is_powder_study)
True

Within the CSD many structures are cross-referenced. These can be obtained and inspected as follows:

>>> cross_refs = ibuprofen.cross_references
>>> print(cross_refs)
(CrossReference(for stereoisomer see [JEKNOC]),)
>>> xref = cross_refs[0]
>>> print(xref.text)
for stereoisomer see [JEKNOC]
>>> print(xref.type)
Stereoisomer
>>> print(xref.scope)
Family
>>> print(xref.identifiers)
('JEKNOC',)

These properties would not be available for an entry read in from, for example, a Mol2 file.

>>> filepath = 'ABEBUF.mol2'

To get access to the entry in this file we make use of a ccdc.io.EntryReader.

>>> entry_reader = io.EntryReader(filepath)
>>> entry_from_mol2 = entry_reader[0]
>>> entry_reader.close()
>>> print(entry_from_mol2.chemical_name)
None
>>> print(entry_from_mol2.formula)
C19 H15 N3 O2

Where the database comes from an SDF file the EntryReader will give access to the SDF tags for the entry via the entry.attributes property:

>>> file_name = 'gold_output.sdf'

To get access to the entry in this file we make use of a ccdc.io.EntryReader.

>>> reader = io.EntryReader(file_name)
>>> entry_from_sdf = reader[0]
>>> for k, v in sorted(entry_from_sdf.attributes.items()):
...     print(k)
...     print(v)
...
Gold.Chemscore.DEClash
17.1265
Gold.Chemscore.DEClash.Weighted
17.1265
Gold.Chemscore.DEInternal
2.1334
Gold.Chemscore.DEInternal.Weighted
2.1334
Gold.Chemscore.DG
-34.5350
Gold.Chemscore.Fitness
15.2752
Gold.Chemscore.Hbond
0.9975
Gold.Chemscore.Hbond.Weighted
-3.3317
Gold.Chemscore.Internal_Hbond
0.0000
Gold.Chemscore.Internal_Hbond.Weighted
0.0000
Gold.Chemscore.Lipo
257.3928
Gold.Chemscore.Lipo.Weighted
-30.1150
Gold.Chemscore.Metal
0.0000
Gold.Chemscore.Metal.Weighted
0.0000
Gold.Chemscore.Rot
1.7155
Gold.Chemscore.Rot.Weighted
4.3916
Gold.Chemscore.ZeroCoef
-5.4800
Gold.Protein.ActiveResidues
 PHE87   TYR96   PHE98   THR101  MET184  THR185  LEU244  VAL247  GLY248  THR252
 VAL295  ASP297  ILE395  VAL396  HEM1
Gold.Protein.RotatedAtoms
   42.8596   40.6557   11.7180 H   0  0  0  0  0  0  0  0  0  0  0  0  # atno 6359 bound_to 3214
   39.9356   46.2719   13.3012 H   0  0  0  0  0  0  0  0  0  0  0  0  # atno 6443 bound_to 742
>>> reader.close()

The values of the attributes dict are all strings: it is left to the user to convert as appropriate.

An entry may be constructed from a molecule with attributes constructed from arbitrary keyword parameters.

>>> from ccdc.entry import Entry
>>> aabhtz_entry = Entry.from_molecule(mol, annotation='First structure in CSD')

Attributes may also be added to an entry after it has been instantiated.

>>> aabhtz_entry.attributes['molecular_weight'] = mol.molecular_weight

Note that attributes of any value may be added to the entry. These attributes will be written to an SDF format file if an EntryWriter is used:

>>> with io.EntryWriter('aabhtz.sdf') as writer:
...     writer.write(aabhtz_entry)

Journals

The CSD contains journal information about the journals referenced by the entries in the database. These may be obtained from the database by constructing an instance of ccdc.entry.JournalList. This collection can then be searched for journals of interest by ccdc.entry.JournalList.simple_search() which will return all ccdc.entry.Journal matching a string pattern whether it be from the full name, the abbreviated name or either in translated form. For example:

>>> import ccdc
>>> entry_reader = io.EntryReader('CSD')
>>> journals = ccdc.entry.JournalList(entry_reader)
>>> entry_reader.close()
>>> print(len(journals)) 
1877
>>> print(journals[0].name)
Acta Crystallogr.

Newer versions of the CSD will provide extended information in the ccdc.entry.Journal. These include:

  • full_name

  • abbreviated_name (also known as name, for compatibility)

  • translated_name for journals in a language other than English

  • abbreviated_translated_name

  • language_name

  • publisher_name

  • state (‘Discontinued’, ‘Current’ or ‘Unknown’)

  • start_year

  • end_year (for those journals which have ceased publication)

  • url (where known)

  • image_url (where known)

  • issn

  • eissn

  • ASTM international coden.

The ccdc.entry.JournalList allows searches by exact full name, or exact abbreviated name, by a regular expression on full name or abbreviated name, and a case-insensitive search by any of the name fields:

>>> print([str(j) for j in journals.simple_search('open med')])
['Journal(The Open Medicinal Chemistry Journal)']
>>> print(journals.by_abbreviated_name('Acta Crystallogr.'))
Journal(Acta Crystallographica [1948-1967])
>>> print([str(j) for j in journals.match_abbreviated_name(r'Acta\s*[Cc]rys.*')]) 
['Journal(Acta Crystallographica [1948-1967])', 'Journal(Acta Crystallographica,Section B:Struct.Crystallogr.Cryst.Chem. [1968-1982])', ...]

When performing a ccdc.search.TextNumericSearch using a journal, the ccdc.entry.Journal may be used, or the ccdc.entry.Journal.name specified.