RDKit
Open-source cheminformatics and machine learning.
RDKit::SubstructLibrary Class Reference

Substructure Search a library of molecules. More...

#include <SubstructLibrary.h>

Public Member Functions

 SubstructLibrary ()
 
 SubstructLibrary (boost::shared_ptr< MolHolderBase > molecules)
 
 SubstructLibrary (boost::shared_ptr< MolHolderBase > molecules, boost::shared_ptr< FPHolderBase > fingerprints)
 
MolHolderBasegetMolHolder ()
 Get the underlying molecule holder implementation. More...
 
const MolHolderBasegetMolecules () const
 
FPHolderBasegetFingerprints ()
 Get the underlying fingerprint implementation. More...
 
const FPHolderBasegetFingerprints () const
 
unsigned int addMol (const ROMol &mol)
 Add a molecule to the library. More...
 
std::vector< unsigned int > getMatches (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1, int maxResults=-1)
 Get the matching indices for the query. More...
 
std::vector< unsigned int > getMatches (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1, int maxResults=-1)
 Get the matching indices for the query between the given indices. More...
 
unsigned int countMatches (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1)
 Return the number of matches for the query. More...
 
unsigned int countMatches (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1)
 Return the number of matches for the query between the given indices. More...
 
bool hasMatch (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1)
 Returns true if any match exists for the query. More...
 
bool hasMatch (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1)
 
boost::shared_ptr< ROMolgetMol (unsigned int idx) const
 Returns the molecule at the given index. More...
 
boost::shared_ptr< ROMoloperator[] (unsigned int idx)
 Returns the molecule at the given index. More...
 
unsigned int size () const
 return the number of molecules in the library More...
 

Detailed Description

Substructure Search a library of molecules.

This class allows for multithreaded substructure searches os large datasets.

The implementations can use fingerprints to speed up searches and have molecules cached as binary forms to reduce memory usage.

basic usage:

lib.addMol(mol);
std::vector<unsigned int> results = lib.getMatches(query);
for(std::vector<unsigned int>::const_iterator matchIndex=results.begin();
matchIndex != results.end();
++matchIndex) {
boost::shared_ptr<ROMol> match = lib.getMol(*matchIndex);
}

Using different mol holders and pattern fingerprints.

boost::shared_ptr<CachedTrustedSmilesMolHolder> molHolder = \
boost::make_shared<CachedTrustedSmilesMolHolder>();
boost::shared_ptr<PatternHolder> patternHolder = \
boost::make_shared<PatternHolder>();
SubstructLibrary lib(molHolder, patternHolder);
lib.addMol(mol);

Cached molecule holders create molecules on demand. There are currently three styles of cached molecules.

CachedMolHolder: stores molecules in the rdkit binary format. CachedSmilesMolHolder: stores molecules in smiles format. CachedTrustedSmilesMolHolder: stores molecules in smiles format.

The CachedTrustedSmilesMolHolder is made to add molecules from a trusted source. This makes the basic assumption that RDKit was used to sanitize and canonicalize the smiles string. In practice this is considerably faster than using arbitrary smiles strings since certain assumptions can be made.

When loading from external data, as opposed to using the "addMol" API, care must be taken to ensure that the pattern fingerprints and smiles are synchronized.

Each pattern holder has an API point for making its fingerprint. This is useful to ensure that the pattern stored in the database will be compatible with the patterns made when analyzing queries.

boost::shared_ptr<CachedTrustedSmilesMolHolder> molHolder = \
boost::make_shared<CachedTrustedSmilesMolHolder>();
boost::shared_ptr<PatternHolder> patternHolder = \
boost::make_shared<PatternHolder>();
// the PatternHolder instance is able to make fingerprints.
// These, of course, can be read from a file. For demonstration
// purposes we construct them here.
const std::string trustedSmiles = "c1ccccc1";
ROMol *m = SmilesToMol(trustedSmiles);
const ExplicitBitVect *bitVector = patternHolder->makeFingerprint(*m);
// The trusted smiles and bitVector can be read from any source.
// This is the fastest way to load a substruct library.
molHolder->addSmiles( trustedSmiles );
patternHolder->addFingerprint( *bitVector );
SubstructLibrary lib(molHolder, patternHolder);
delete m;
delete bitVector;

Definition at line 342 of file SubstructLibrary.h.

Constructor & Destructor Documentation

◆ SubstructLibrary() [1/3]

RDKit::SubstructLibrary::SubstructLibrary ( )
inline

Definition at line 349 of file SubstructLibrary.h.

◆ SubstructLibrary() [2/3]

RDKit::SubstructLibrary::SubstructLibrary ( boost::shared_ptr< MolHolderBase molecules)
inline

Definition at line 355 of file SubstructLibrary.h.

◆ SubstructLibrary() [3/3]

RDKit::SubstructLibrary::SubstructLibrary ( boost::shared_ptr< MolHolderBase molecules,
boost::shared_ptr< FPHolderBase fingerprints 
)
inline

Definition at line 358 of file SubstructLibrary.h.

Member Function Documentation

◆ addMol()

unsigned int RDKit::SubstructLibrary::addMol ( const ROMol mol)

Add a molecule to the library.

Parameters
molMolecule to add

returns index for the molecule in the library

◆ countMatches() [1/2]

unsigned int RDKit::SubstructLibrary::countMatches ( const ROMol query,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1 
)

Return the number of matches for the query.

Parameters
queryQuery to match against molecules
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]

◆ countMatches() [2/2]

unsigned int RDKit::SubstructLibrary::countMatches ( const ROMol query,
unsigned int  startIdx,
unsigned int  endIdx,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1 
)

Return the number of matches for the query between the given indices.

Parameters
queryQuery to match against molecules
startIdxStart index of the search
endIdxEnding idx (non-inclusive) of the search.
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]

◆ getFingerprints() [1/2]

FPHolderBase& RDKit::SubstructLibrary::getFingerprints ( )
inline

Get the underlying fingerprint implementation.

Throws a value error if no fingerprints have been set

Definition at line 378 of file SubstructLibrary.h.

◆ getFingerprints() [2/2]

const FPHolderBase& RDKit::SubstructLibrary::getFingerprints ( ) const
inline

Definition at line 384 of file SubstructLibrary.h.

◆ getMatches() [1/2]

std::vector<unsigned int> RDKit::SubstructLibrary::getMatches ( const ROMol query,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1,
int  maxResults = -1 
)

Get the matching indices for the query.

Parameters
queryQuery to match against molecules
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]
maxResultsMaximum results to return, -1 means return all [default -1]

◆ getMatches() [2/2]

std::vector<unsigned int> RDKit::SubstructLibrary::getMatches ( const ROMol query,
unsigned int  startIdx,
unsigned int  endIdx,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1,
int  maxResults = -1 
)

Get the matching indices for the query between the given indices.

Parameters
queryQuery to match against molecules
startIdxStart index of the search
endIdxEnding idx (non-inclusive) of the search.
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]
maxResultsMaximum results to return, -1 means return all [default -1]

◆ getMol()

boost::shared_ptr<ROMol> RDKit::SubstructLibrary::getMol ( unsigned int  idx) const
inline

Returns the molecule at the given index.

Parameters
idxIndex of the molecule in the library

Definition at line 514 of file SubstructLibrary.h.

References RDKit::MolHolderBase::getMol(), and PRECONDITION.

◆ getMolecules()

const MolHolderBase& RDKit::SubstructLibrary::getMolecules ( ) const
inline

Definition at line 371 of file SubstructLibrary.h.

References PRECONDITION.

◆ getMolHolder()

MolHolderBase& RDKit::SubstructLibrary::getMolHolder ( )
inline

Get the underlying molecule holder implementation.

Definition at line 366 of file SubstructLibrary.h.

References PRECONDITION.

◆ hasMatch() [1/2]

bool RDKit::SubstructLibrary::hasMatch ( const ROMol query,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1 
)

Returns true if any match exists for the query.

Parameters
queryQuery to match against molecules
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]

◆ hasMatch() [2/2]

bool RDKit::SubstructLibrary::hasMatch ( const ROMol query,
unsigned int  startIdx,
unsigned int  endIdx,
bool  recursionPossible = true,
bool  useChirality = true,
bool  useQueryQueryMatches = false,
int  numThreads = -1 
)

Returns true if any match exists for the query between the specified indices

Parameters
queryQuery to match against molecules
startIdxStart index of the search
endIdxEnding idx (inclusive) of the search.
recursionPossibleflags whether or not recursive matches are allowed [ default true ]
useChiralityuse atomic CIP codes as part of the comparison [ default true ]
useQueryQueryMatchesif set, the contents of atom and bond queries [ default false ] will be used as part of the matching
numThreadsIf -1 use all available processors [default -1]

◆ operator[]()

boost::shared_ptr<ROMol> RDKit::SubstructLibrary::operator[] ( unsigned int  idx)
inline

Returns the molecule at the given index.

Parameters
idxIndex of the molecule in the library

Definition at line 524 of file SubstructLibrary.h.

References RDKit::MolHolderBase::getMol(), and PRECONDITION.

◆ size()

unsigned int RDKit::SubstructLibrary::size ( ) const
inline

return the number of molecules in the library

Definition at line 531 of file SubstructLibrary.h.

References PRECONDITION.


The documentation for this class was generated from the following file: