Working with entries¶
Introduction¶
The ccdc.entry
module contains the ccdc.entry.Entry
class which represents a database entry.
Typically a database entry will be read from the CSD. Let us therefore
import the ccdc.io
module and read in the first entry from the
CSD.
>>> from ccdc import io
>>> csd_reader = io.EntryReader('CSD')
>>> first_csd_entry = csd_reader[0]
Accessing entry properties¶
Let us have a look at the crystallographic properties available to us from a CSD crystal structure.
First of all it is worth noting that one can get access to the
underlying ccdc.crystal.Crystal
and ccdc.molecule.Molecule
from the entry.
>>> mol = first_csd_entry.crystal.molecule
>>> print(mol.identifier)
AABHTZ
Some entries in the CSD exhibit disorder. By default the ccdc.entry.Entry.molecule
will
suppress disordered atoms. For example, ‘ABABUB’ in the CSD is such an entry:
>>> ababub = csd_reader.entry('ABABUB')
>>> mol = ababub.molecule
>>> print(len(mol.atoms))
30
>>> print(mol.formula)
C12 H15 N1 O2
A convention within the CSD is that disordered atoms have labels ending with a ‘?’. The full molecule, with suppressed atoms, can be retrieved:
>>> disordered_mol = ababub.disordered_molecule
>>> print(len(disordered_mol.atoms))
42
>>> print(disordered_mol.formula)
C12 H15 N1 O2
For entries that have been accessed from the CSD, one can also access the chemical name and formula of the crystal content.
>>> print(first_csd_entry.chemical_name)
4-Acetoamido-3-(1-acetyl-2-(2,6-dichlorobenzylidene)hydrazine)-1,2,4-triazole
>>> print(first_csd_entry.formula)
C13 H12 Cl2 N6 O2
Let us illustrate some more properties available for CSD entries using a crystal structure of ibuprofen.
>>> ibuprofen = csd_reader.entry('IBPRAC')
>>> print(ibuprofen.has_3d_structure)
True
>>> print(ibuprofen.has_disorder)
False
>>> print(ibuprofen.is_organometallic)
False
>>> print(ibuprofen.is_polymeric)
False
>>> print(ibuprofen.bioactivity)
analgesic and antiinflammatory agent
>>> print('\n'.join(ibuprofen.synonyms))
Ibuprofen
Advil
Motrin
Nurofen
DrugBank: DB01050
PDB Chemical Component code: IZP
This particular CSD entry does not have a publication DOI.
>>> print(ibuprofen.publication.doi)
None
However where it is defined it will be returned.
>>> abebuf = csd_reader.entry('ABEBUF')
>>> print(abebuf.publication.doi)
10.1021/cg049957u
>>> print(abebuf.publication)
Citation(authors='S.W.Gordon-Wylie, E.Teplin, J.C.Morris, M.I.Trombley,
S.M.McCarthy, W.M.Cleaver, G.R.Clark',
journal='Journal(Crystal Growth and Design)', volume='4', year=2004,
first_page='789', doi='10.1021/cg049957u')
A publication is returned as a named tuple, whose members may be retrieved by name or index, containing authors, journal_name, volume, year, number of first page and the publication’s doi where present.
>>> print(ibuprofen.publication)
Citation(authors='J.F.McConnell',
journal='Journal(Crystal Structure Communications)', volume='3', year=1974,
first_page='73', doi=None)
>>> print(ibuprofen.publication.authors)
J.F.McConnell
From a CSD entry it is also possible to get hold of information on the colour, melting point, polymorphic form description, any disorder and the radiation source of the crystal’s determination or deposition date when this information is available in the underlying CSD entry.
>>> print(ibuprofen.color)
None
>>> print(ibuprofen.melting_point)
None
>>> print(ibuprofen.polymorph)
polymorph 1
>>> print(ibuprofen.disorder_details)
None
>>> ababub = csd_reader.entry('ABABUB')
>>> print(ababub.disorder_details)
The cyclohexene ring is disordered over two sites with occupancies 0.5878:0.4122.
>>> abinor01 = csd_reader.entry('ABINOR01')
>>> print(abinor01.radiation_source)
Neutron
>>> print(ibuprofen.deposition_date)
1974-06-21
The entry for ibuprofen has no editorial comments; where they are present they may represent editorial decisions made during structural curation, or patent information:
>>> print(ibuprofen.remarks)
None
>>> abapcu = csd_reader.entry('ABAPCU')
>>> print(abapcu.remarks)
The position of the hydrate is dubious. It has been deleted
>>> arisok = csd_reader.entry('ARISOK')
>>> print(arisok.remarks)
U.S. Patent: US 6858644 B2
An indication of whether an experiment was done at pressure is available. This is given as a string and the units have not yet been normalised. Where this is None, the experiment was performed at ambient pressure:
>>> print(ibuprofen.pressure)
None
>>> abulit03 = csd_reader.entry('ABULIT03')
>>> print(abulit03.pressure)
1.4 GPa
An boolean property will indicate whether or not the crystallographic determination was performed in a powder study:
>>> print(ibuprofen.is_powder_study)
False
>>> acataa = csd_reader.entry('ACATAA')
>>> print(acataa.is_powder_study)
True
Within the CSD many structures are cross-referenced. These can be obtained and inspected as follows:
>>> cross_refs = ibuprofen.cross_references
>>> print(cross_refs)
(CrossReference(for stereoisomer see [JEKNOC]),)
>>> xref = cross_refs[0]
>>> print(xref.text)
for stereoisomer see [JEKNOC]
>>> print(xref.type)
Stereoisomer
>>> print(xref.scope)
Family
>>> print(xref.identifiers)
('JEKNOC',)
These properties would not be available for an entry read in from, for example, a Mol2 file.
>>> filepath = 'ABEBUF.mol2'
To get access to the entry in this file we make use of a
ccdc.io.EntryReader
.
>>> entry_reader = io.EntryReader(filepath)
>>> entry_from_mol2 = entry_reader[0]
>>> entry_reader.close()
>>> print(entry_from_mol2.chemical_name)
None
>>> print(entry_from_mol2.formula)
C19 H15 N3 O2
Where the database comes from an SDF file the EntryReader will give access to the SDF tags for the entry via the
entry.attributes
property:
>>> file_name = 'gold_output.sdf'
To get access to the entry in this file we make use of a
ccdc.io.EntryReader
.
>>> reader = io.EntryReader(file_name)
>>> entry_from_sdf = reader[0]
>>> for k, v in sorted(entry_from_sdf.attributes.items()):
... print(k)
... print(v)
...
Gold.Chemscore.DEClash
17.1265
Gold.Chemscore.DEClash.Weighted
17.1265
Gold.Chemscore.DEInternal
2.1334
Gold.Chemscore.DEInternal.Weighted
2.1334
Gold.Chemscore.DG
-34.5350
Gold.Chemscore.Fitness
15.2752
Gold.Chemscore.Hbond
0.9975
Gold.Chemscore.Hbond.Weighted
-3.3317
Gold.Chemscore.Internal_Hbond
0.0000
Gold.Chemscore.Internal_Hbond.Weighted
0.0000
Gold.Chemscore.Lipo
257.3928
Gold.Chemscore.Lipo.Weighted
-30.1150
Gold.Chemscore.Metal
0.0000
Gold.Chemscore.Metal.Weighted
0.0000
Gold.Chemscore.Rot
1.7155
Gold.Chemscore.Rot.Weighted
4.3916
Gold.Chemscore.ZeroCoef
-5.4800
Gold.Protein.ActiveResidues
PHE87 TYR96 PHE98 THR101 MET184 THR185 LEU244 VAL247 GLY248 THR252
VAL295 ASP297 ILE395 VAL396 HEM1
Gold.Protein.RotatedAtoms
42.8596 40.6557 11.7180 H 0 0 0 0 0 0 0 0 0 0 0 0 # atno 6359 bound_to 3214
39.9356 46.2719 13.3012 H 0 0 0 0 0 0 0 0 0 0 0 0 # atno 6443 bound_to 742
>>> reader.close()
The values of the attributes dict are all strings: it is left to the user to convert as appropriate.
An entry may be constructed from a molecule with attributes constructed from arbitrary keyword parameters.
>>> from ccdc.entry import Entry
>>> aabhtz_entry = Entry.from_molecule(mol, annotation='First structure in CSD')
Attributes may also be added to an entry after it has been instantiated.
>>> aabhtz_entry.attributes['molecular_weight'] = mol.molecular_weight
Note that attributes of any value may be added to the entry. These attributes will be written to an SDF format file if an EntryWriter is used:
>>> with io.EntryWriter('aabhtz.sdf') as writer:
... writer.write(aabhtz_entry)
Journals¶
The CSD contains journal information about the journals referenced by the entries in the database. These
may be obtained from the database by constructing an instance of ccdc.entry.JournalList
. This collection
can then be searched for journals of interest by ccdc.entry.JournalList.simple_search()
which will return
all ccdc.entry.Journal
matching a string pattern whether it be from the full name, the abbreviated name
or either in translated form. For example:
>>> import ccdc
>>> entry_reader = io.EntryReader('CSD')
>>> journals = ccdc.entry.JournalList(entry_reader)
>>> entry_reader.close()
>>> print(len(journals))
1877
>>> print(journals[0].name)
Acta Crystallogr.
Newer versions of the CSD will provide extended information in the ccdc.entry.Journal
. These include:
full_name
abbreviated_name (also known as name, for compatibility)
translated_name for journals in a language other than English
abbreviated_translated_name
language_name
publisher_name
state (‘Discontinued’, ‘Current’ or ‘Unknown’)
start_year
end_year (for those journals which have ceased publication)
url (where known)
image_url (where known)
issn
eissn
ASTM international coden.
The ccdc.entry.JournalList
allows searches by exact full name, or exact abbreviated name, by a regular expression
on full name or abbreviated name, and a case-insensitive search by any of the name fields:
>>> print([str(j) for j in journals.simple_search('open med')])
['Journal(The Open Medicinal Chemistry Journal)']
>>> print(journals.by_abbreviated_name('Acta Crystallogr.'))
Journal(Acta Crystallographica [1948-1967])
>>> print([str(j) for j in journals.match_abbreviated_name(r'Acta\s*[Cc]rys.*')])
['Journal(Acta Crystallographica [1948-1967])', 'Journal(Acta Crystallographica,Section B:Struct.Crystallogr.Cryst.Chem. [1968-1982])', ...]
When performing a ccdc.search.TextNumericSearch
using a journal, the ccdc.entry.Journal
may be used, or
the ccdc.entry.Journal.name
specified.