Descriptors API¶
Introduction¶
The ccdc.descriptors
module contains classes for calculating descriptors.
The main classes in the ccdc.descriptors
module are:
ccdc.descriptors.MolecularDescriptors
.ccdc.descriptors.GeometricDescriptors
.ccdc.descriptors.CrystalDescriptors.PowderPattern
.ccdc.descriptors.CrystalDescriptors.Morphology
.ccdc.descriptors.CrystalDescriptors.GraphSetSearch
.ccdc.descriptors.CrystalDescriptors.HBondCoordination
.ccdc.descriptors.CrystalDescriptors.HBondPropensities
.ccdc.descriptors.StatisticalDescriptors
.
API¶
-
class
ccdc.descriptors.
MolecularDescriptors
[source]¶ Namespace for descriptors of a molecular nature.
-
class
AdjacencyMatrixDescriptorCalculator
(molecule)[source]¶ Descriptor calculator for descriptors based on a molecule’s adjacency matrix.
-
self_returning_walk
(k)[source]¶ Return the number of walks of length k that start and end at the same atom. See Handbook of Molecular Descriptors, page 384, “self-returning walk counts”.
Parameters: k – the number of steps to walk. Returns: float
-
self_returning_walk_ln
(k)[source]¶ Return the logarithm of the number of walks of length k that start and end at the same atom. See Handbook of Molecular Descriptors, page 384, “self-returning walk counts”.
Parameters: k – the number of steps to walk. Returns: float
-
topological_charge_autocorrelation_index
(k)[source]¶ Calculate the topological charge autocorrelation index See https://pubs.acs.org/doi/pdf/10.1021/ci00019a008
Parameters: k – the topological distance to measure across Returns: float
-
-
class
AtomDistanceSearch
(molecule)[source]¶ More rapid searching for atoms within a certain distance of a point.
-
class
AtomPairDistanceDescriptorCalculator
(molecule)[source]¶ Atom pair distance descriptor calculations.
Parameters: molecule – a ccdc.molecule.Molecule
instance.-
element_pair_count
(element_a, element_b, distance)[source]¶ Return a count of the number of times a pair of elements appear with a specified minimum path length. See Handbook of Molecular Descriptors, page 428, “substructure descriptors, atom pairs”.
Parameters: - element_a – str. the first element name.
- element_b – str. the second element name.
- distance – int. the number of bonds between atoms of the specified the elements.
Returns: float
-
-
class
ConnectivityIndices
(molecule)[source]¶ Connectivitiy index descriptor calculations.
-
class
MaximumCommonSubstructure
(settings=None)[source]¶ Identifies the maximum common substructure of two molecules.
-
class
Settings
[source]¶ Settings for the MCS calculation.
-
check_bond_count
¶ Whether the bond count of an atom be checked.
-
check_bond_polymeric
¶ Check whether the bond be polymeric.
-
check_bond_type
¶ Whether the bond type be checked.
-
check_charge
¶ Whether the atom charge be checked.
-
check_element
¶ Whether the element be checked.
-
check_hydrogen_count
¶ Whether the atom’s hydrogen count be checked.
-
connected
¶ Whether substructure should be connected.
Note that finding disconnected maximal substructures is a lot slower than finding connected.
-
ignore_hydrogens
¶ Whether the hydrogens be ignored.
-
-
search
(mol1, mol2, only_edges=False)[source]¶ Calculate the maximum common substructure between two molecules.
Parameters: - mol2 (mol1,) –
ccdc.molecule.Molecule
instances. - only_edges – bool. The search will find a maximal common substructure matching only the edges.
Returns: a pair of tuples, giving matched
ccdc.molecule.Atom
andccdc.molecule.Bond
instances.Note: this function is computationally exponential, so will take a long time on large molecules.
- mol2 (mol1,) –
-
class
-
class
PrincipleAxesAlignedBox
(molecule)[source]¶ The bounding box of the molecule aligned on its principle axes.
The vectors of the box have lengths of the size of the box. The x_vector is the major axis of the molecule, the y_vector the minor axis and the z_vector the minimal axis of the molecule.
-
aligned_molecule
¶ The molecule aligned along its principle axes, with centre at its centre of geometry.
-
volume
¶ The volume of the box.
-
x_vector
¶ The vector of the major axis of the box.
-
y_vector
¶ The vector of the minor axis of the box.
-
z_vector
¶ The vector of the minimal axis of the box.
-
-
static
atom_angle
(a, b, c)[source]¶ Angle subtended by three arbitrary atoms.
Parameters: - a –
ccdc.molecule.Atom
- b –
ccdc.molecule.Atom
- c –
ccdc.molecule.Atom
Returns: float - the angle in degrees or
None
if one of the atoms has no coordinates- a –
-
static
atom_distance
(a, b)[source]¶ Distance between two atom irrespective their parent molecules.
Parameters: - a –
ccdc.molecule.Atom
- b –
ccdc.molecule.Atom
Returns: float or
None
if one of the atoms has no coordinates- a –
-
static
atom_plane
(*atoms)[source]¶ Define a plane from the coordinates of the atoms.
Parameters: atoms – there must be at least three ccdc.molecule.Atom
in the arguments.
-
static
atom_torsion_angle
(a, b, c, d)[source]¶ Plane angle subtended by the triples abc and bcd.
Parameters: - a –
ccdc.molecule.Atom
- b –
ccdc.molecule.Atom
- c –
ccdc.molecule.Atom
- d –
ccdc.molecule.Atom
Returns: float - the angle in degrees or
None
if one of the atoms has no coordinates- a –
-
static
atom_vector
(atom0, atom1)[source]¶ Define the vector from atom0 to atom1.
Parameters: - atom0 –
ccdc.molecule.Atom
- atom1 –
ccdc.molecule.Atom
Returns: Raises: RuntimeError if either atom has no coordinates.
- atom0 –
-
static
bond_length
(bond)[source]¶ The length of a bond.
Parameters: bond – ccdc.molecule.Bond
Returns: float, or None
if an atom of the bond has no coordinates
-
static
overlay
(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True)[source]¶ Overlay mol2 on mol1.
Parameters: - mol1 – a
ccdc.molecule.Molecule
instance - mol2 – a
ccdc.molecule.Molecule
instance - atoms – a list of pairs of atoms to use in the overlay, or None for all atoms to be used
- invert – allow inversions in the overlay
- rotate_torsions – allow torsional rotations when overlaying
- with_symmetry – take account of symmetry when overlaying atoms
Returns: a
ccdc.molecule.Molecule
instance which is a copy of mol2 overlaid on mol1Note: if with_symmetry is true, and matching atoms are provided, then the matching atoms need to form a connected structure.
- mol1 – a
-
static
overlay_rmsd_and_rmsd_tanimoto
(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True)[source]¶ Overlay mol2 on mol1. Deprecated and replaced with
ccdc.MolecularDescriptors.overlay_rmsds_and_transformation()
.Parameters: - mol1 – a
ccdc.molecule.Molecule
instance - mol2 – a
ccdc.molecule.Molecule
instance - atoms – a list of pairs of atoms to use in the overlay, or None for all atoms to be used
- invert – allow inversions in the overlay
- rotate_torsions – allow torsional rotations when overlaying
- with_symmetry – take account of symmetry when overlaying atoms
Returns: a tuple containing a
ccdc.molecule.Molecule
instance which is a copy of mol2 overlaid on mol1 as entry 0, the rmsd as entry 1, the Tanimoto rmsd as entry 2- mol1 – a
-
static
overlay_rmsds_and_transformation
(mol1, mol2, atoms=None, invert=False, rotate_torsions=False, with_symmetry=True)[source]¶ Overlay mol2 on mol1 and return properties of the overlay.
Parameters: - mol1 – a
ccdc.molecule.Molecule
instance - mol2 – a
ccdc.molecule.Molecule
instance - atoms – a list of pairs of atoms to use in the overlay, or None for all atoms to be used
- invert – allow inversions in the overlay
- rotate_torsions – allow torsional rotations when overlaying
- with_symmetry – take account of symmetry when overlaying atoms
Returns: a tuple containing a
ccdc.molecule.Molecule
instance which is a copy of mol2 overlaid on mol1 as entry 0, the rmsd as entry 1, the Tanimoto rmsd as entry 2 and the overlay transformation as entry 3- mol1 – a
-
static
point_group_analysis
(mol)[source]¶ Return Schoenflies notation of the point group symmetry of a molecule.
The point group symmetry is returned as a tuple of:
- order (e.g. 1)
- symbol (e.g. ‘C1’)
- description (e.g. ‘Objects in this point group have no symmetry.’)
Parameters: mol – ccdc.molecule.Molecule
Returns: (int, str, str)
-
static
ring_centroid
(ring)[source]¶ The centroid of the ring’s atoms.
Parameters: ring – ccdc.molecule.Molecule.Ring
-
static
ring_plane
(ring)[source]¶ The plane of the ring’s atoms.
Parameters: ring – ccdc.molecule.Molecule.Ring
-
static
rmsd
(mol1, mol2, atoms=None, overlay=False, exclude_hydrogens=True, with_symmetry=True)[source]¶ Return the RMSD of two molecules.
Both molecules should have the same atoms if
atoms
isNone
.Parameters: - atoms – a list of pairs
ccdc.molecule.Atom
orNone
- overlay – Whether to overlay the molecules before calculating RMSD
- exclude_hydrogens – Whether all-atom or heavy atom RMSD should be calculated
- with_symmetry – Whether to allow symmetrical matches
Returns: float
- atoms – a list of pairs
-
class
-
class
ccdc.descriptors.
GeometricDescriptors
[source]¶ A namespace to hold geometric classes and functions.
-
class
Plane
(vector, distance, _plane=None)[source]¶ A plane in 3D.
-
distance
¶ The distance from the origin of the plane.
-
normal
¶ The normal to the plane.
-
plane_vector1
¶ A vector in the plane, normal to the plane’s normal.
-
plane_vector2
¶ A vector in the plane, normal to both the plane’s normal and the plane’s plane_vector1.
-
-
class
Note
The powder pattern, morphology, hydrogen-bond coordination and graph set features are available only to CSD-Materials and CSD-Enterprise users.
-
class
ccdc.descriptors.
CrystalDescriptors
[source]¶ Namespace for crystallographic descriptors.
-
class
GraphSetSearch
(settings=None)[source]¶ Finds the graph sets of a crystal.
-
class
GraphSet
(_graph_set_atoms, _view)[source]¶ An individual graph set.
-
degree
¶ The degree of the graph set, i.e. the number of atoms involved.
-
descriptor
¶ The descriptor of the graph set.
-
edge_labels
¶ The edge labels of the graph set.
The labels are arbitrary letters identifying a unique hydrogen bond, separated by ‘>’ or ‘<’ indicating the donor-acceptor direction.
-
hbonds
¶ The hydrogen bonds of the graph set.
Returns: a tuple of ccdc.crystal.Crystal.HBond
instances.
-
label_set
¶ The set of hydrogen bond labels found in the graph set.
-
nacceptors
¶ The number of acceptors involved in the graph set.
-
ndonors
¶ The number of donors involved in the graph set.
-
nmolecules
¶ The number of molecules involved in the graph set.
-
period
¶ The period of the graph set, i.e the number of hydrogen bonds in the repeat unit.
If the type of the graph set is not a chain or a ring this will be -1
-
-
class
Settings
(hbond_criterion=None)[source]¶ Configurable settings for the graph set analyser.
-
angle_tolerance
¶ The tolerance of the HBond angle.
-
distance_range
¶ Allowable distance range for a HBond to be formed.
-
intermolecular
¶ Whether HBonds should be intermolecular, intramolecular, or any.
-
level
= 2¶ deepest level to search. This is the number of different HBonds involved.
-
max_chain_size
= 4¶ longest chain to search
-
max_discrete_chain_size
= 4¶ longest discrete chain to search
-
max_ring_size
= 6¶ largest ring to search
-
path_length_range
¶ The shortest and longest bond-path separation for intramolecular contacts.
-
require_hydrogens
¶ Whether Hydrogens are required for the HBond.
-
vdw_corrected
¶ Whether the distance range is Van der Waals corrected.
-
-
search
(crystal)[source]¶ Find all graph sets for the crystal subject to the constraints of the settings.
Parameters: crystal – ccdc.crystal.Crystal
instance.Returns: a tuple of ccdc.descriptors.CrystalDescriptors.GraphSetSearch.GraphSet
instances.
-
class
-
class
HBondCoordination
(settings=None)[source]¶ Calculate HBond coordination predictions.
The HBondCoordination class is available only to CSD-Materials and CSD-Enterprise users.
-
class
Predictions
(crystal, _analysis, _predictions)[source]¶ The predictions for HBonds coordinations.
-
class
Observation
(label, coordination_count, probability)¶ -
coordination_count
¶ Alias for field number 1
-
label
¶ Alias for field number 0
-
probability
¶ Alias for field number 2
-
-
functional_groups_of_hbond
(hbond)[source]¶ The functional group pertaining to a hydrogen-bonding atom.
-
is_valid
¶ Whether or not valid predictions were made.
-
observed
¶ The predicted probabilities of observed HBonds.
-
class
-
class
-
class
HBondPropensities
(settings=None)[source]¶ Calculates HBond propensities.
-
class
FittingData
(_fitting_data=None, identifiers=None, databases=None)[source]¶ The collection of entries used for the prediction.
-
class
FittingDataEntry
(_fitting_item)[source]¶ An individual entry with associated matching data.
-
identifier
¶ The identifier of the fitting data item.
-
-
advice_comment
(functional_group=None)[source]¶ A string indicating whether or not there are enough data for propensity predictions.
Note: when first made the fitting data has not performed substructure matching, so results for particular groups will be inappropriately bad. Results will be valid after
ccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.match_fitting_data()
has been called.
-
class
-
class
FunctionalGroup
(_model_group)[source]¶ A functional group capable of hydrogen bonding.
-
identifier
¶ The name of the functional group.
-
-
class
HBondAcceptor
(_analysis)[source]¶ A potental acceptor atom.
This class will be augmented with the evidence found during match_fitting_data().
-
acceptor_atom_type
¶ A string representation of the atom’s acceptor type.
-
-
class
HBondAtom
(_analysis)[source]¶ Base class for HBondDonor and HBondAcceptor.
-
accessible_surface_area
¶ The accessible surface area of the HBond atom.
-
atom
¶ The
ccdc.molecule.Atom
of the HBondAtom.
-
functional_group_identifier
¶ The identifier of the functional group for this atom.
-
identifier
¶ The full identifier of this atom.
-
label
¶ The label of the atom in the original structure.
-
nlone_pairs
¶ The number of lone pairs associated with this atom.
-
-
class
HBondDonor
(_analysis)[source]¶ A potential donor atom.
This class will be augmented with the evidence found during match_fitting_data().
-
donor_atom_type
¶ A string representation of the atom’s donor type.
-
-
class
HBondGrouping
(hbond_propensities, _outcome)[source]¶ A grouping of interactions between donors and acceptors representing a possible hbond network.
This represents a point in the chart of Mercury’s HBondPropensity wizard.
-
class
InterPropensity
(hbp, _prediction)[source]¶ Predicted propensity for a single HBond.
-
is_intermolecular
¶ Whether or not the predicted propensity is for an intermolecular HBond.
-
-
class
IntraPropensity
(hbp, _prediction)[source]¶ Predicted propensity for an intramolecular HBond.
-
is_intermolecular
¶ Whether or not the predicted propensity is for an intermolecular HBond.
-
-
class
Model
(_model)[source]¶ The logistic regression model.
-
class
Coefficient
(_coefficient)[source]¶ A coefficient of the regression model.
-
confidence_interval
¶ The upper and lower bounds of the coefficient.
-
estimate
¶ The estimate of the coefficient.
-
identifier
¶ The identifier of the coefficient.
-
is_baseline
¶ Whether or not the coefficient is used for the baseline calculation.
-
p_value
¶ P-value of the coefficient.
-
significance_code
¶ A string representation of how significant the parameter is.
‘*’ for P-value < 0.01, ‘’ < 0.01. ‘*’ < 0.05 and ‘.’ < 0.1
-
standard_error
¶ Standard error of the coefficient.
-
z_value
¶ Z-value of the coefficient.
-
-
class
Parameter
(_crystal_structure_property)[source]¶ A named parameter of the regression.
-
calculate
(donor, acceptor)[source]¶ The value of this property for the pair of atoms.
Parameters: - donor – ccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.HBondDonor instance.
- acceptor – ccdc.hbond_coordination.CrystalDescriptors.HBondPropensities.HBondAcceptor instance.
Returns: float
-
identifier
¶ The identifier of the parameter.
-
-
advice_comment
¶ A string representing the quality of the discrimination based on the ROC.
-
akaike_information_criterion
¶ The Akaike Information Criterion (AIC) of the model.
-
area_under_roc_curve
¶ Area under the ROC curve.
-
coefficients
¶ The coefficients of the model.
-
equation
¶ The regression equation.
-
log_likelihood
¶ The log likelihood of the model.
-
log_likelihood_test_p_value
¶ The P-value of the log likelihood of the model.
-
null_deviance
¶ The null deviance of the model.
-
null_deviance_degrees_of_freedom
¶ The degrees of freedom of the null deviance of the model.
-
residual_deviance
¶ The residual deviance of the model.
-
residual_deviance_degrees_of_freedom
¶ The number of degrees of freedom of the residual deviance of the model.
-
class
-
class
Propensity
(hbp, _prediction)[source]¶ Base class for inter- and intra-molecular propensity predictions.
-
acceptor_label
¶ The label of the acceptor atom.
-
bounds
¶ The lower and upper bounds of the prediction.
-
donor_label
¶ The label of the donor atom.
-
is_observed
¶ Whether the hbond is observed in the target structure.
-
predictive_error
¶ The error in the prediction.
-
propensity
¶ The predicted value.
-
scores
¶ The calculated values and statistics for the hbond prediction.
-
uncertainty
¶ The uncertainty in the prediction.
-
-
class
Settings
[source]¶ Pertaining to HBond propensity calculation.
-
databases
¶ The databases to be used for the prediction.
Note: the databases MUST be SQLite ASER databases for the moment.
-
limit_identifier_list
¶ A list of identifiers to limit the search
-
working_directory
¶ The working directory for the predictions.
-
-
calculate_propensities
(crystal=None)[source]¶ Apply the regression equation to a crystal.
Parameters: crystal – ccdc.crystal.Crystal
instance or None. If None the target structure will be used.
-
fitting_data
¶ The fitting data.
-
generate_hbond_groupings
(min_donor_prob=None, min_acceptor_prob=None)[source]¶ Generate all possible permutations of donors and acceptors to create all possible hbond groupings.
-
hbond_atoms
(crystal=None)[source]¶ The HBondDonor and HBondAcceptor atoms of a crystal.
Parameters: crystal – ccdc.crystal.Crystal
instance, or None, in which case the HBondAtoms of the target will be returned.Returns: a pair of tuples of ccdc.descriptors.CrystalDescriptors.HBondPropensities.HBondDonor
andccdc.descriptors.CrystalDescriptors.HBondPropensities.HBondAcceptor
.
-
make_fitting_data
()[source]¶ Deprecated method. Please use match_fitting_data or use CrystalDescriptors.HBondPropensities.FittingData.from_file to limit the entries that are searched
returns an object that will cause all of the database entries to be searched
-
match_fitting_data
(count=None, verbose=False)[source]¶ Reduces fitting data down such that each functional group has at least the specified number of examples.
-
propensities
¶ The inter- and intra-propensities of the prediction.
-
set_target
(crystal)[source]¶ Sets a single target for the propensity calculation.
Parameters: crystal – a ccdc.crystal.Crystal
instance.
-
class
-
class
Morphology
(crystal=None)[source]¶ The BFDH morphology of a crystal.
The morphology class is available only to CSD-Materials and CSD-Enterprise users.
-
class
Facet
(_facet, _perpendicular_distance, _miller_indices)[source]¶ One of the faces of a morphology.
-
area
¶ The area of the polygon.
-
centre_of_geometry
¶ The centre of geometry of the facet.
-
coordinates
¶ The coordinates of the facet.
-
edges
¶ The edges making up the facet.
-
miller_indices
¶ The Miller indices of the facet.
-
perpendicular_distance
¶ The perpendicular distance from the origin.
-
plane
¶ The plane of the facet.
This is a
ccdc.descriptors.GeometricDescriptors.Plane
instance.
-
-
class
OrientedBoundingBox
(morphology)[source]¶ The bounding box of the morphology.
This box is not necessarily axis-aligned.
-
corners
¶ The eight points forming the corners of the bounding box.
-
major_length
¶ The length of the major axis.
-
median_length
¶ The length of the middle axis.
-
minor_length
¶ The minor axis of the bounding box.
-
volume
¶ The volume of the bounding box.
-
-
bounding_box
¶ The bounding box of the morphology.
A pair of
ccdc.molecule.Coordinates
representing the minimum and maximum corners of the box.
-
centre_of_geometry
¶ The centroid of the morphology.
-
facets
¶ The faces making up the morphology.
-
static
from_file
(file_name)[source]¶ Creates a Morphology instance from a cif file.
The CIF file should be those written by this class or mercury.
-
static
from_growth_rates
(crystal, growth_rates)[source]¶ Creates a morphology from an iterable of growth rates.
Parameters: - crystal – an instance of
ccdc.crystal.Crystal
. - growth_rates – an iterable of pairs,
ccdc.crystal.Crystal.MillerIndices
and perpendicular distance, otherwise known as morphological importance.
- crystal – an instance of
-
oriented_bounding_box
¶ The minimum volume box of the morphology.
This will not necessarily be aligned to the orthonormal cartesian axes.
-
scale_factor
¶ The factor by which the morphology is scaled.
-
volume
¶ The volume of the morphology.
This is calculated stochastically, rather than analytically, so has some error.
-
class
-
class
PowderPattern
(_pattern, _settings=None, _simulation=None, _crystal=None)[source]¶ Represents a powder pattern.
- The powder pattern class is available only to CSD-Materials and
- CSD-Enterprise users.
-
class
Settings
[source]¶ Settings which may be set for a Powder simulation.
Setting
None
for any of the attributes will result in a default value being used.-
deuterium_is_hydrogen
= None¶ Whether deuterium and hydrogen are indistinguishable.
-
full_width_at_half_maximum
= None¶ Peak width at half height (0.1).
-
include_hydrogens
= None¶ Whether to include hydrogens.
-
second_wavelength
= None¶ Optional second wavelength.
-
two_theta_maximum
= None¶ Maximum value of two_theta (50.0).
-
two_theta_minimum
= None¶ Minimum value of two_theta (5.0).
-
two_theta_step
= None¶ Step size (0.02).
-
wavelength
= None¶ Wavelength for the simulation.
-
-
class
TickMark
(_tick, _crystal=None)[source]¶ A tick mark in a simulated powder pattern.
-
is_systematically_absent
¶ Whether this tick mark is systematically absent.
-
miller_indices
¶ The Miller indices of this tick mark.
-
two_theta
¶ Two theta value of this tick.
-
-
class
Wavelength
(wavelength=None, scale_factor=1.0)[source]¶ Represents a wavelength for powder studies.
Some standard wavelengths - these are floats, not
ccdc.descriptors.CrystalDescriptors.PowderPattern.Wavelength
-
scale_factor
¶ The scale factor of this Wavelength.
-
wavelength
¶ The wavelength.
-
-
esd
¶ The array of esd values (Estimated Square Deviations).
-
static
from_crystal
(crystal, settings=None)[source]¶ Create a CrystalDescriptors.PowderPattern from a crystal.
Parameters: - crystal –
ccdc.crystal.Crystal
- settings –
ccdc.descriptors.CrystalDescriptors.PowderPattern.Settings
- crystal –
-
static
from_xye_file
(file_name)[source]¶ Create a CrystalDescriptors.PowderPattern from an xye file.
Parameters: file_name – path to xye file
-
integral
(start=0.0, end=180.0)[source]¶ The area under the curve.
Parameters: - start – float
- end – float
Returns: float
-
intensity
¶ The array of intensity values.
-
similarity
(other)[source]¶ Measure of match between this pattern and another.
Parameters: other – ccdc.descriptors.CrystalDescriptors.PowderPattern
Returns: float
-
tick_marks
¶ The array of tick marks if this is a simulated powder pattern.
Returns: list of ccdc.descriptors.CrystalDescriptors.PowderPattern.TickMark
orNone
if this is not a simulated powder pattern.
-
two_theta
¶ The array of two_theta values.
-
class
-
class
ccdc.descriptors.
StatisticalDescriptors
[source]¶ A namespace holding statistical descriptors.
-
class
RankStatistics
(scores, activity_column=None)[source]¶ Represents a ranked collection of values for which statistics can be derived.
-
ACC
(fraction=0.0)[source]¶ Calculate accuracy metric (ACC) at the specified fraction.
ACC = (TP+TN) / (TP+FP+TN+FN)
Parameters: fraction – position within data for which accuracy metric is to be determined. Raises: ValueError if fraction is not within interval [0,1]
-
BEDROC
(alpha=0.0)[source]¶ Calculate Boltzmann-Enhanced Discrimination of ROC (BEDROC) as defined in:
Truchon J., Bayly C.I., “Evaluating Virtual Screening Methods: Good and Bad Metric for the “Early Recognition” Problem” J. Chem. Inf. Model. 47:488-508 (2007).
Parameters: alpha – exponential weighting factor. Raises: ValueError if alpha is less than or equal to 0.0.
-
EF
(fraction=0.0)[source]¶ Calculate enrichment factor (EF) at the specified fraction.
Parameters: fraction – position within data for which enrichment factor is to be determined. Raises: ValueError if fraction is not within interval [0,1]
-
PPV
(fraction=0.0)[source]¶ Calculate precision or positive predictive value (PPV) at the specified fraction.
Parameters: fraction – position within data for which precision is to be determined. Raises: ValueError if fraction is not within interval [0,1]
-
RIE
(alpha=0.0)[source]¶ Calculate robust initial enhancement (RIE) as defined in:
Sheridan R.P., Singh S.B., Fluder E.M., Kearsley S.K., “Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches” J. Chem. Inf. Comp. Sci. 41:1395-1406 (2001).
Parameters: alpha – exponential weighting factor Raises: ValueError if alpha is less than or equal to 0.0
-
ROC
()[source]¶ Calculate receiver operating characteristic (ROC) curve.
Returns: list, list - True positive rate, False positive rate
-
activity_column
¶ Get extractor for active/inactive classification from scores data.
-
-
class