Working with entries¶
Typically a database entry will be read from the CSD. Let us therefore
ccdc.io module and read in the first entry from the
>>> from ccdc import io >>> csd_reader = io.EntryReader('CSD') >>> first_csd_entry = csd_reader
Accessing entry properties¶
Let us have a look at the crystallographic properties available to us from a CSD crystal structure.
>>> mol = first_csd_entry.crystal.molecule >>> print(mol.identifier) AABHTZ
Some entries in the CSD exhibit disorder. By default the
suppress disordered atoms. For example, ‘ABABUB’ in the CSD is such an entry:
>>> ababub = csd_reader.entry('ABABUB') >>> mol = ababub.molecule >>> print(len(mol.atoms)) 30 >>> print(mol.formula) C12 H15 N1 O2
A convention within the CSD is that disordered atoms have labels ending with a ‘?’. The full molecule, with suppressed atoms, can be retrieved:
>>> disordered_mol = ababub.disordered_molecule >>> print(len(disordered_mol.atoms)) 42 >>> print(disordered_mol.formula) C12 H15 N1 O2
For entries that have been accessed from the CSD, one can also access the chemical name and formula of the crystal content.
>>> print(first_csd_entry.chemical_name) 4-Acetoamido-3-(1-acetyl-2-(2,6-dichlorobenzylidene)hydrazine)-1,2,4-triazole >>> print(first_csd_entry.formula) C13 H12 Cl2 N6 O2
Let us illustrate some more properties available for CSD entries using a crystal structure of ibuprofen.
>>> ibuprofen = csd_reader.entry('IBPRAC') >>> print(ibuprofen.has_3d_structure) True >>> print(ibuprofen.has_disorder) False >>> print(ibuprofen.is_organometallic) False >>> print(ibuprofen.is_polymeric) False >>> print(ibuprofen.bioactivity) analgesic and antiinflammatory agent >>> print('\n'.join(ibuprofen.synonyms)) Ibuprofen Advil Motrin Nurofen DrugBank: DB01050 PDB Chemical Component code: IZP
This particular CSD entry does not have a publication DOI.
>>> print(ibuprofen.publication.doi) None
However where it is defined it will be returned.
>>> print(csd_reader.entry('ABEBUF').publication.doi) 10.1021/cg049957u >>> print(csd_reader.entry('ABEBUF').publication) Citation(authors='S.W.Gordon-Wylie, E.Teplin, J.C.Morris, M.I.Trombley, S.M.McCarthy, W.M.Cleaver, G.R.Clark', journal='Journal(Crystal Growth and Design)', volume='4', year=2004, first_page='789', doi='10.1021/cg049957u')
A publication is returned as a named tuple, whose members may be retrieved by name or index, containing authors, journal_name, volume, year, number of first page and the publication’s doi where present.
>>> print(ibuprofen.publication) Citation(authors='J.F.McConnell', journal='Journal(Crystal Structure Communications)', volume='3', year=1974, first_page='73', doi=None) >>> print(ibuprofen.publication.authors) J.F.McConnell
From a CSD entry it is also possible to get hold of information on the colour, melting point, polymorphic form description, any disorder and the radiation source of the crystal’s determination or deposition date when this information is available in the underlying CSD entry.
>>> print(ibuprofen.color) None >>> print(ibuprofen.melting_point) None >>> print(ibuprofen.polymorph) polymorph 1 >>> print(ibuprofen.disorder_details) None >>> print(csd_reader.entry('ABABUB').disorder_details) The cyclohexene ring is disordered over two sites with occupancies 0.5878:0.4122. >>> print(csd_reader.entry('ABINOR01').radiation_source) Neutron >>> print(ibuprofen.deposition_date) 1974-06-21
The entry for ibuprofen has no editorial comments; where they are present they may represent editorial decisions made during structural curation, or patent information:
>>> print(ibuprofen.remarks) None >>> print(csd_reader.entry('ABAPCU').remarks) The position of the hydrate is dubious. It has been deleted >>> print(csd_reader.entry('ARISOK').remarks) U.S. Patent: US 6858644 B2
An indication of whether an experiment was done at pressure is available. This is given as a string and the units have not yet been normalised. Where this is None, the experiment was performed at ambient pressure:
>>> print(ibuprofen.pressure) None >>> print(csd_reader.entry('ABULIT03').pressure) at 1.4 GPa
An boolean property will indicate whether or not the crystallographic determination was performed in a powder study:
>>> print(ibuprofen.is_powder_study) False >>> print(csd_reader.entry('ACATAA').is_powder_study) True
Within the CSD many structures are cross-referenced. These can be obtained and inspected as follows:
>>> cross_refs = ibuprofen.cross_references >>> print(cross_refs) (CrossReference(for stereoisomer see [JEKNOC]),) >>> xref = cross_refs >>> print(xref.text) for stereoisomer see [JEKNOC] >>> print(xref.type) Stereoisomer >>> print(xref.scope) Family >>> print(xref.identifiers) ('JEKNOC',)
These properties would not be available for an entry read in from, for example, a mol2 file.
>>> filepath = 'ABEBUF.mol2'
To get access to the entry in this file we make use of a
>>> entry_reader = io.EntryReader(filepath) >>> entry_from_mol2 = entry_reader >>> entry_reader.close() >>> print(entry_from_mol2.chemical_name) None >>> print(entry_from_mol2.formula) C19 H15 N3 O2
Where the database comes from an sdf file the EntryReader will give access to the SDF tags for the entry via the entry.attributes property:
>>> file_name = 'gold_output.sdf'
To get access to the entry in this file we make use of a
>>> reader = io.EntryReader(file_name) >>> entry_from_sdf = reader >>> for k, v in sorted(entry_from_sdf.attributes.items()): ... print(k) ... print(v) ... Gold.Chemscore.DEClash 17.1265 Gold.Chemscore.DEClash.Weighted 17.1265 Gold.Chemscore.DEInternal 2.1334 Gold.Chemscore.DEInternal.Weighted 2.1334 Gold.Chemscore.DG -34.5350 Gold.Chemscore.Fitness 15.2752 Gold.Chemscore.Hbond 0.9975 Gold.Chemscore.Hbond.Weighted -3.3317 Gold.Chemscore.Internal_Hbond 0.0000 Gold.Chemscore.Internal_Hbond.Weighted 0.0000 Gold.Chemscore.Lipo 257.3928 Gold.Chemscore.Lipo.Weighted -30.1150 Gold.Chemscore.Metal 0.0000 Gold.Chemscore.Metal.Weighted 0.0000 Gold.Chemscore.Rot 1.7155 Gold.Chemscore.Rot.Weighted 4.3916 Gold.Chemscore.ZeroCoef -5.4800 Gold.Protein.ActiveResidues PHE87 TYR96 PHE98 THR101 MET184 THR185 LEU244 VAL247 GLY248 THR252 VAL295 ASP297 ILE395 VAL396 HEM1 Gold.Protein.RotatedAtoms 42.8596 40.6557 11.7180 H 0 0 0 0 0 0 0 0 0 0 0 0 # atno 6359 bound_to 3214 39.9356 46.2719 13.3012 H 0 0 0 0 0 0 0 0 0 0 0 0 # atno 6443 bound_to 742 >>> reader.close()
The values of the attributes dict are all strings: it is left to the user to convert as appropriate.
An entry may be constructed from a molecule with attributes constructed from arbitrary keyword parameters.
>>> from ccdc.entry import Entry >>> aabhtz_entry = Entry.from_molecule(mol, annotation='First structure in CSD')
Attributes may also be added to an entry after it has been instantiated.
>>> aabhtz_entry.attributes['molecular_weight'] = mol.molecular_weight
Note that attributes of any value may be added to the entry. These attributes will be written to an sdf format file if an EntryWriter is used:
>>> with io.EntryWriter('aabhtz.sdf') as writer: ... writer.write(aabhtz_entry)
The CSD contains journal information about the journals referenced by the entries in the database. These
may be obtained from the database by constructing an instance of
ccdc.entry.JournalList. This collection
can then be searched for journals of interest by
ccdc.entry.JournalList.simple_search() which will return
ccdc.entry.Journal matching a string pattern whether it be from the full name, the abbreviated name
or either in translated form. For example:
>>> import ccdc >>> entry_reader = io.EntryReader('CSD') >>> journals = ccdc.entry.JournalList(entry_reader) >>> entry_reader.close() >>> print(len(journals)) 1877 >>> print(journals.name) Acta Crystallogr.
Newer versions of the CSD will provide extended information in the
ccdc.entry.Journal. These include:
abbreviated_name (also known as name, for compatibility)
translated_name for journals in a language other than English
state (‘Discontinued’, ‘Current’ or ‘Unknown’)
end_year (for those journals which have ceased publication)
url (where known)
image_url (where known)
ASTM international coden.
ccdc.entry.JournalList allows searches by exact full name, or exact abbreviated name, by a regular expression
on full name or abbreviated name, and a case-insensitive search by any of the name fields:
>>> print([str(j) for j in journals.simple_search('open med')]) ['Journal(The Open Medicinal Chemistry Journal)'] >>> print(journals.by_abbreviated_name('Acta Crystallogr.')) Journal(Acta Crystallographica [1948-1967]) >>> print([str(j) for j in journals.match_abbreviated_name(r'Acta\s*[Cc]rys.*')]) ['Journal(Acta Crystallographica [1948-1967])', 'Journal(Acta Crystallographica,Section B:Struct.Crystallogr.Cryst.Chem. [1968-1982])', ...]