![]() |
RDKit
Open-source cheminformatics and machine learning.
|
Groups a variety of molecular query and transformation operations. More...
Functions | |
int | countAtomElec (const Atom *at) |
return the number of electrons available on an atom to donate for aromaticity More... | |
int | getFormalCharge (const ROMol &mol) |
sums up all atomic formal charges and returns the result More... | |
bool | atomHasConjugatedBond (const Atom *at) |
returns whether or not the given Atom is involved in a conjugated bond More... | |
unsigned int | getMolFrags (const ROMol &mol, std::vector< int > &mapping) |
find fragments (disconnected components of the molecular graph) More... | |
unsigned int | getMolFrags (const ROMol &mol, std::vector< std::vector< int > > &frags) |
find fragments (disconnected components of the molecular graph) More... | |
std::vector< boost::shared_ptr< ROMol > > | getMolFrags (const ROMol &mol, bool sanitizeFrags=true, std::vector< int > *frags=0, std::vector< std::vector< int > > *fragsMolAtomMapping=0, bool copyConformers=true) |
splits a molecule into its component fragments More... | |
template<typename T > | |
std::map< T, boost::shared_ptr< ROMol > > | getMolFragsWithQuery (const ROMol &mol, T(*query)(const ROMol &, const Atom *), bool sanitizeFrags=true, const std::vector< T > *whiteList=0, bool negateList=false) |
splits a molecule into pieces based on labels assigned using a query More... | |
double | computeBalabanJ (const ROMol &mol, bool useBO=true, bool force=false, const std::vector< int > *bondPath=0, bool cacheIt=true) |
calculates Balaban's J index for the molecule More... | |
double | computeBalabanJ (double *distMat, int nb, int nAts) |
unsigned | getNumAtomsWithDistinctProperty (const ROMol &mol, std::string prop) |
returns the number of atoms which have a particular property set More... | |
Dealing with hydrogens | |
ROMol * | addHs (const ROMol &mol, bool explicitOnly=false, bool addCoords=false) |
returns a copy of a molecule with hydrogens added in as explicit Atoms More... | |
void | addHs (RWMol &mol, bool explicitOnly=false, bool addCoords=false) |
ROMol * | removeHs (const ROMol &mol, bool implicitOnly=false, bool updateExplicitCount=false, bool sanitize=true) |
returns a copy of a molecule with hydrogens removed More... | |
void | removeHs (RWMol &mol, bool implicitOnly=false, bool updateExplicitCount=false, bool sanitize=true) |
ROMol * | mergeQueryHs (const ROMol &mol, bool mergeUnmappedOnly=false) |
void | mergeQueryHs (RWMol &mol, bool mergeUnmappedOnly=false) |
ROMol * | renumberAtoms (const ROMol &mol, const std::vector< unsigned int > &newOrder) |
returns a copy of a molecule with the atoms renumbered More... | |
Ring finding and SSSR | |
int | findSSSR (const ROMol &mol, std::vector< std::vector< int > > &res) |
finds a molecule's Smallest Set of Smallest Rings More... | |
int | findSSSR (const ROMol &mol, std::vector< std::vector< int > > *res=0) |
void | fastFindRings (const ROMol &mol) |
use a DFS algorithm to identify ring bonds and atoms in a molecule More... | |
int | symmetrizeSSSR (ROMol &mol, std::vector< std::vector< int > > &res) |
symmetrize the molecule's Smallest Set of Smallest Rings More... | |
int | symmetrizeSSSR (ROMol &mol) |
Shortest paths and other matrices | |
double * | getAdjacencyMatrix (const ROMol &mol, bool useBO=false, int emptyVal=0, bool force=false, const char *propNamePrefix=0, const boost::dynamic_bitset<> *bondsToUse=0) |
returns a molecule's adjacency matrix More... | |
double * | getDistanceMat (const ROMol &mol, bool useBO=false, bool useAtomWts=false, bool force=false, const char *propNamePrefix=0) |
Computes the molecule's topological distance matrix. More... | |
double * | getDistanceMat (const ROMol &mol, const std::vector< int > &activeAtoms, const std::vector< const Bond * > &bonds, bool useBO=false, bool useAtomWts=false) |
Computes the molecule's topological distance matrix. More... | |
double * | get3DDistanceMat (const ROMol &mol, int confId=-1, bool useAtomWts=false, bool force=false, const char *propNamePrefix=0) |
Computes the molecule's 3D distance matrix. More... | |
std::list< int > | getShortestPath (const ROMol &mol, int aid1, int aid2) |
Find the shortest path between two atoms. More... | |
Stereochemistry | |
void | cleanupChirality (RWMol &mol) |
removes bogus chirality markers (those on non-sp3 centers): More... | |
void | assignChiralTypesFrom3D (ROMol &mol, int confId=-1, bool replaceExistingTags=true) |
Uses a conformer to assign ChiralType to a molecule's atoms. More... | |
void | assignStereochemistry (ROMol &mol, bool cleanIt=false, bool force=false, bool flagPossibleStereoCenters=false) |
Assign stereochemistry tags to atoms (i.e. R/S) and bonds (i.e. Z/E) More... | |
void | removeStereochemistry (ROMol &mol) |
Removes all stereochemistry information from atoms (i.e. R/S) and bonds (i.e. Z/E) More... | |
void | findPotentialStereoBonds (ROMol &mol, bool cleanIt=false) |
finds bonds that could be cis/trans in a molecule and mark them as Bond::STEREONONE More... | |
Sanitization | |
enum | SanitizeFlags { SANITIZE_NONE =0x0, SANITIZE_CLEANUP =0x1, SANITIZE_PROPERTIES =0x2, SANITIZE_SYMMRINGS =0x4, SANITIZE_KEKULIZE =0x8, SANITIZE_FINDRADICALS =0x10, SANITIZE_SETAROMATICITY =0x20, SANITIZE_SETCONJUGATION =0x40, SANITIZE_SETHYBRIDIZATION =0x80, SANITIZE_CLEANUPCHIRALITY =0x100, SANITIZE_ADJUSTHS =0x200, SANITIZE_ALL =0xFFFFFFF } |
void | sanitizeMol (RWMol &mol, unsigned int &operationThatFailed, unsigned int sanitizeOps=SANITIZE_ALL) |
carries out a collection of tasks for cleaning up a molecule and ensuring that it makes "chemical sense" More... | |
void | sanitizeMol (RWMol &mol) |
int | setAromaticity (RWMol &mol) |
Sets up the aromaticity for a molecule. More... | |
void | cleanUp (RWMol &mol) |
Designed to be called by the sanitizer to handle special cases before anything is done. More... | |
void | assignRadicals (RWMol &mol) |
Called by the sanitizer to assign radical counts to atoms. More... | |
void | adjustHs (RWMol &mol) |
adjust the number of implicit and explicit Hs for special cases More... | |
void | Kekulize (RWMol &mol, bool markAtomsBonds=true, unsigned int maxBackTracks=100) |
Kekulizes the molecule. More... | |
void | setConjugation (ROMol &mol) |
flags the molecule's conjugated bonds More... | |
void | setHybridization (ROMol &mol) |
calculates and sets the hybridization of all a molecule's Stoms More... | |
Groups a variety of molecular query and transformation operations.
ROMol* RDKit::MolOps::addHs | ( | const ROMol & | mol, |
bool | explicitOnly = false , |
||
bool | addCoords = false |
||
) |
returns a copy of a molecule with hydrogens added in as explicit Atoms
mol | the molecule to add Hs to |
explicitOnly | (optional) if this true , only explicit Hs will be added |
addCoords | (optional) If this is true, estimates for the atomic coordinates of the added Hs will be used. |
Notes:
addCoords
option if the molecule's heavy atoms don't already have coordinates.delete
ing the pointer this returns. void RDKit::MolOps::addHs | ( | RWMol & | mol, |
bool | explicitOnly = false , |
||
bool | addCoords = false |
||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void RDKit::MolOps::adjustHs | ( | RWMol & | mol | ) |
adjust the number of implicit and explicit Hs for special cases
Currently this:
"c1cc[nH]cc1"
mol | the molecule of interest |
Assumptions
void RDKit::MolOps::assignChiralTypesFrom3D | ( | ROMol & | mol, |
int | confId = -1 , |
||
bool | replaceExistingTags = true |
||
) |
Uses a conformer to assign ChiralType to a molecule's atoms.
mol | the molecule of interest |
confId | the conformer to use |
replaceExistingTags | if this flag is true, any existing atomic chiral tags will be replaced |
If the conformer provided is not a 3D conformer, nothing will be done.
void RDKit::MolOps::assignRadicals | ( | RWMol & | mol | ) |
Called by the sanitizer to assign radical counts to atoms.
void RDKit::MolOps::assignStereochemistry | ( | ROMol & | mol, |
bool | cleanIt = false , |
||
bool | force = false , |
||
bool | flagPossibleStereoCenters = false |
||
) |
Assign stereochemistry tags to atoms (i.e. R/S) and bonds (i.e. Z/E)
mol | the molecule of interest |
cleanIt | toggles removal of stereo flags from double bonds that can not have stereochemistry |
force | forces the calculation to be repeated even if it has already been done |
flagPossibleStereoCenters | set the _ChiralityPossible property on atoms that are possible stereocenters |
Notes:M
bool RDKit::MolOps::atomHasConjugatedBond | ( | const Atom * | at | ) |
returns whether or not the given Atom is involved in a conjugated bond
void RDKit::MolOps::cleanUp | ( | RWMol & | mol | ) |
Designed to be called by the sanitizer to handle special cases before anything is done.
Currently this:
"[N+](=O)[O-]"
mol | the molecule of interest |
void RDKit::MolOps::cleanupChirality | ( | RWMol & | mol | ) |
removes bogus chirality markers (those on non-sp3 centers):
double RDKit::MolOps::computeBalabanJ | ( | const ROMol & | mol, |
bool | useBO = true , |
||
bool | force = false , |
||
const std::vector< int > * | bondPath = 0 , |
||
bool | cacheIt = true |
||
) |
calculates Balaban's J index for the molecule
mol | the molecule of interest |
useBO | toggles inclusion of the bond order in the calculation (when false, we're not really calculating the J value) |
force | forces the calculation (instead of using cached results) |
bondPath | when included, only paths using bonds whose indices occur in this vector will be included in the calculation |
cacheIt | If this is true, the calculated value will be cached as a property on the molecule |
double RDKit::MolOps::computeBalabanJ | ( | double * | distMat, |
int | nb, | ||
int | nAts | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
int RDKit::MolOps::countAtomElec | ( | const Atom * | at | ) |
return the number of electrons available on an atom to donate for aromaticity
The result is determined using the default valency, number of lone pairs, number of bonds and the formal charge. Note that the atom may not donate all of these electrons to a ring for aromaticity (also used in Conjugation and hybridization code).
at | the atom of interest |
void RDKit::MolOps::fastFindRings | ( | const ROMol & | mol | ) |
use a DFS algorithm to identify ring bonds and atoms in a molecule
NOTE: though the RingInfo structure is populated by this function, the only really reliable calls that can be made are to check if mol.getRingInfo().numAtomRings(idx) or mol.getRingInfo().numBondRings(idx) return values >0
void RDKit::MolOps::findPotentialStereoBonds | ( | ROMol & | mol, |
bool | cleanIt = false |
||
) |
finds bonds that could be cis/trans in a molecule and mark them as Bond::STEREONONE
mol | the molecule of interest |
cleanIt | toggles removal of stereo flags from double bonds that can not have stereochemistry |
This function is usefuly in two situations
int RDKit::MolOps::findSSSR | ( | const ROMol & | mol, |
std::vector< std::vector< int > > & | res | ||
) |
finds a molecule's Smallest Set of Smallest Rings
Currently this implements a modified form of Figueras algorithm (JCICS - Vol. 36, No. 5, 1996, 986-991)
mol | the molecule of interest |
res | used to return the vector of rings. Each entry is a vector with atom indices. This information is also stored in the molecule's RingInfo structure, so this argument is optional (see overload) |
Base algorithm:
Our Modifications:
These changes were motivated by several factors:
Referenced by RDKit::Drawing::DrawMol().
int RDKit::MolOps::findSSSR | ( | const ROMol & | mol, |
std::vector< std::vector< int > > * | res = 0 |
||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
double* RDKit::MolOps::get3DDistanceMat | ( | const ROMol & | mol, |
int | confId = -1 , |
||
bool | useAtomWts = false , |
||
bool | force = false , |
||
const char * | propNamePrefix = 0 |
||
) |
Computes the molecule's 3D distance matrix.
mol | the molecule of interest |
confId | the conformer to use |
useAtomWts | sets the diagonal elements of the result to 6.0/(atomic number) |
force | forces calculation of the matrix, even if already computed |
propNamePrefix | used to set the cached property name |
Notes
delete
this pointer. double* RDKit::MolOps::getAdjacencyMatrix | ( | const ROMol & | mol, |
bool | useBO = false , |
||
int | emptyVal = 0 , |
||
bool | force = false , |
||
const char * | propNamePrefix = 0 , |
||
const boost::dynamic_bitset<> * | bondsToUse = 0 |
||
) |
returns a molecule's adjacency matrix
mol | the molecule of interest |
useBO | toggles use of bond orders in the matrix |
emptyVal | sets the empty value (for non-adjacent atoms) |
force | forces calculation of the matrix, even if already computed |
propNamePrefix | used to set the cached property name |
Notes
delete
this pointer. double* RDKit::MolOps::getDistanceMat | ( | const ROMol & | mol, |
bool | useBO = false , |
||
bool | useAtomWts = false , |
||
bool | force = false , |
||
const char * | propNamePrefix = 0 |
||
) |
Computes the molecule's topological distance matrix.
Uses the Floyd-Warshall all-pairs-shortest-paths algorithm.
mol | the molecule of interest |
useBO | toggles use of bond orders in the matrix |
useAtomWts | sets the diagonal elements of the result to 6.0/(atomic number) so that the matrix can be used to calculate Balaban J values. This does not affect the bond weights. |
force | forces calculation of the matrix, even if already computed |
propNamePrefix | used to set the cached property name |
Notes
delete
this pointer. double* RDKit::MolOps::getDistanceMat | ( | const ROMol & | mol, |
const std::vector< int > & | activeAtoms, | ||
const std::vector< const Bond * > & | bonds, | ||
bool | useBO = false , |
||
bool | useAtomWts = false |
||
) |
Computes the molecule's topological distance matrix.
Uses the Floyd-Warshall all-pairs-shortest-paths algorithm.
mol | the molecule of interest |
activeAtoms | only elements corresponding to these atom indices will be included in the calculation |
bonds | only bonds found in this list will be included in the calculation |
useBO | toggles use of bond orders in the matrix |
useAtomWts | sets the diagonal elements of the result to 6.0/(atomic number) so that the matrix can be used to calculate Balaban J values. This does not affect the bond weights. |
Notes
delete
this pointer. int RDKit::MolOps::getFormalCharge | ( | const ROMol & | mol | ) |
sums up all atomic formal charges and returns the result
unsigned int RDKit::MolOps::getMolFrags | ( | const ROMol & | mol, |
std::vector< int > & | mapping | ||
) |
find fragments (disconnected components of the molecular graph)
mol | the molecule of interest |
mapping | used to return the mapping of Atoms->fragments. On return mapping will be mol->getNumAtoms() long and will contain the fragment assignment for each Atom |
unsigned int RDKit::MolOps::getMolFrags | ( | const ROMol & | mol, |
std::vector< std::vector< int > > & | frags | ||
) |
find fragments (disconnected components of the molecular graph)
mol | the molecule of interest |
frags | used to return the Atoms in each fragment On return mapping will be numFrags long, and each entry will contain the indices of the Atoms in that fragment. |
std::vector<boost::shared_ptr<ROMol> > RDKit::MolOps::getMolFrags | ( | const ROMol & | mol, |
bool | sanitizeFrags = true , |
||
std::vector< int > * | frags = 0 , |
||
std::vector< std::vector< int > > * | fragsMolAtomMapping = 0 , |
||
bool | copyConformers = true |
||
) |
splits a molecule into its component fragments
mol | the molecule of interest |
sanitizeFrags | toggles sanitization of the fragments after they are built |
frags | used to return the mapping of Atoms->fragments. if provided, frags will be mol->getNumAtoms() long on return and will contain the fragment assignment for each Atom |
fragsMolAtomMapping | used to return the Atoms in each fragment On return mapping will be numFrags long, and each entry will contain the indices of the Atoms in that fragment. |
copyConformers | toggles copying conformers of the fragments after they are built |
std::map<T,boost::shared_ptr<ROMol> > RDKit::MolOps::getMolFragsWithQuery | ( | const ROMol & | mol, |
T(*)(const ROMol &, const Atom *) | query, | ||
bool | sanitizeFrags = true , |
||
const std::vector< T > * | whiteList = 0 , |
||
bool | negateList = false |
||
) |
splits a molecule into pieces based on labels assigned using a query
mol | the molecule of interest |
query | the query used to "label" the molecule for fragmentation |
sanitizeFrags | toggles sanitization of the fragments after they are built |
whiteList | if provided, only labels in the list will be kept |
negateList | if true, the white list logic will be inverted: only labels not in the list will be kept |
unsigned RDKit::MolOps::getNumAtomsWithDistinctProperty | ( | const ROMol & | mol, |
std::string | prop | ||
) |
returns the number of atoms which have a particular property set
std::list<int> RDKit::MolOps::getShortestPath | ( | const ROMol & | mol, |
int | aid1, | ||
int | aid2 | ||
) |
Find the shortest path between two atoms.
Uses the Bellman-Ford algorithm
mol | molecule of interest |
aid1 | index of the first atom |
aid2 | index of the second atom |
Notes:
void RDKit::MolOps::Kekulize | ( | RWMol & | mol, |
bool | markAtomsBonds = true , |
||
unsigned int | maxBackTracks = 100 |
||
) |
Kekulizes the molecule.
mol | the molecule of interest |
markAtomsBonds | if this is set to true, isAromatic boolean settings on both the Bonds and Atoms are turned to false following the Kekulization, otherwise they are left alone in their original state. |
maxBackTracks | the maximum number of attempts at back-tracking. The algorithm uses a back-tracking procedure to revist a previous setting of double bond if we hit a wall in the kekulization process |
Notes:
markAtomsBonds
is false
the BondType
for all aromatic bonds will be changed from RDKit::Bond::AROMATIC
to RDKit::Bond::SINGLE
or RDKit::Bond::DOUBLE during Kekulization. Referenced by RDKit::Drawing::MolToDrawing().
returns a copy of a molecule with hydrogens removed and added as queries to the heavy atoms to which they are bound.
This is really intended to be used with molecules that contain QueryAtoms
mol | the molecule to remove Hs from |
Notes:
"[H][H]"
from having all atoms removed.delete
ing the pointer this returns.void RDKit::MolOps::mergeQueryHs | ( | RWMol & | mol, |
bool | mergeUnmappedOnly = false |
||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
ROMol* RDKit::MolOps::removeHs | ( | const ROMol & | mol, |
bool | implicitOnly = false , |
||
bool | updateExplicitCount = false , |
||
bool | sanitize = true |
||
) |
returns a copy of a molecule with hydrogens removed
mol | the molecule to remove Hs from |
implicitOnly | (optional) if this true , only implicit Hs will be removed |
updateExplicitCount | (optional) If this is true , when explicit Hs are removed from the graph, the heavy atom to which they are bound will have its counter of explicit Hs increased. |
sanitize | (optional) If this is true , the final molecule will be sanitized |
Notes:
"[H][H]"
from having all atoms removed.delete
ing the pointer this returns. Referenced by RDKit::ForwardSDMolSupplier::ForwardSDMolSupplier(), RDKit::SDMolSupplier::SDMolSupplier(), and RDKit::SDMolSupplier::~SDMolSupplier().
void RDKit::MolOps::removeHs | ( | RWMol & | mol, |
bool | implicitOnly = false , |
||
bool | updateExplicitCount = false , |
||
bool | sanitize = true |
||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void RDKit::MolOps::removeStereochemistry | ( | ROMol & | mol | ) |
Removes all stereochemistry information from atoms (i.e. R/S) and bonds (i.e. Z/E)
mol | the molecule of interest |
ROMol* RDKit::MolOps::renumberAtoms | ( | const ROMol & | mol, |
const std::vector< unsigned int > & | newOrder | ||
) |
returns a copy of a molecule with the atoms renumbered
mol | the molecule to work with |
newOrder | the new ordering of the atoms (should be numAtoms long) for example: if newOrder is [3,2,0,1], then atom 3 in the original molecule will be atom 0 in the new one |
Notes:
delete
ing the pointer this returns. void RDKit::MolOps::sanitizeMol | ( | RWMol & | mol, |
unsigned int & | operationThatFailed, | ||
unsigned int | sanitizeOps = SANITIZE_ALL |
||
) |
carries out a collection of tasks for cleaning up a molecule and ensuring that it makes "chemical sense"
This functions calls the following in sequence
mol | : the RWMol to be cleaned |
operationThatFailed | : the first (if any) sanitization operation that fails is set here. The values are taken from the SanitizeFlags enum. On success, the value is SanitizeFlags::SANITIZE_NONE |
sanitizeOps | : the bits here are used to set which sanitization operations are carried out. The elements of the SanitizeFlags enum define the operations. |
Notes:
SanitException
will be thrown.void RDKit::MolOps::sanitizeMol | ( | RWMol & | mol | ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
int RDKit::MolOps::setAromaticity | ( | RWMol & | mol | ) |
Sets up the aromaticity for a molecule.
This is what happens here:
mol | the RWMol of interest |
Assumptions:
MolOps::Kekulize()
has already been called) void RDKit::MolOps::setConjugation | ( | ROMol & | mol | ) |
flags the molecule's conjugated bonds
void RDKit::MolOps::setHybridization | ( | ROMol & | mol | ) |
calculates and sets the hybridization of all a molecule's Stoms
int RDKit::MolOps::symmetrizeSSSR | ( | ROMol & | mol, |
std::vector< std::vector< int > > & | res | ||
) |
symmetrize the molecule's Smallest Set of Smallest Rings
SSSR rings obatined from "findSSSR" can be non-unique in some case. For example, cubane has five SSSR rings, not six as one would hope.
This function adds additional rings to the SSSR list if necessary to make the list symmetric, e.g. all atoms in cubane will be part of the same number of SSSRs. This function choses these extra rings from the extra rings computed and discarded during findSSSR. The new ring are chosen such that:
mol | - the molecule of interest |
res | used to return the vector of rings. Each entry is a vector with atom indices. This information is also stored in the molecule's RingInfo structure, so this argument is optional (see overload) |
Notes:
int RDKit::MolOps::symmetrizeSSSR | ( | ROMol & | mol | ) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.