RDKit
Open-source cheminformatics and machine learning.
Loading...
Searching...
No Matches
Embedder.h
Go to the documentation of this file.
1//
2// Copyright (C) 2004-2017 Greg Landrum and Rational Discovery LLC
3//
4// @@ All Rights Reserved @@
5// This file is part of the RDKit.
6// The contents are covered by the terms of the BSD license
7// which is included in the file license.txt, found at the root
8// of the RDKit source tree.
9//
10
11#include <RDGeneral/export.h>
12#ifndef RD_EMBEDDER_H_GUARD
13#define RD_EMBEDDER_H_GUARD
14
15#include <map>
16#include <utility>
17#include <Geometry/point.h>
18#include <GraphMol/ROMol.h>
19#include <boost/shared_ptr.hpp>
21
22namespace RDKit {
23namespace DGeomHelpers {
24
37
38//! Parameter object for controlling embedding
39/*!
40 numConfs Number of conformations to be generated
41 numThreads Sets the number of threads to use (more than one thread
42 will only be used if the RDKit was build with multithread
43 support) If set to zero, the max supported by the system will
44 be used.
45 maxIterations Max. number of times the embedding will be tried if
46 coordinates are not obtained successfully. The default
47 value is 10x the number of atoms.
48 randomSeed provides a seed for the random number generator (so that
49 the same coordinates can be obtained for a
50 molecule on multiple runs) If -1, the
51 RNG will not be seeded.
52 clearConfs Clear all existing conformations on the molecule
53 useRandomCoords Start the embedding from random coordinates instead of
54 using eigenvalues of the distance matrix.
55 boxSizeMult Determines the size of the box that is used for
56 random coordinates. If this is a positive number, the
57 side length will equal the largest element of the distance
58 matrix times \c boxSizeMult. If this is a negative number,
59 the side length will equal \c -boxSizeMult (i.e. independent
60 of the elements of the distance matrix).
61 randNegEig Picks coordinates at random when a embedding process produces
62 negative eigenvalues
63 numZeroFail Fail embedding if we find this many or more zero eigenvalues
64 (within a tolerance)
65 pruneRmsThresh Retain only the conformations out of 'numConfs' after
66 embedding that are at least this far apart from each other.
67 RMSD is computed on the heavy atoms.
68 Prunining is greedy; i.e. the first embedded conformation is
69 retained and from then on only those that are at least
70 \c pruneRmsThresh away from already
71 retained conformations are kept. The pruning is done
72 after embedding and bounds violation minimization.
73 No pruning by default.
74 coordMap a map of int to Point3D, between atom IDs and their locations
75 their locations. If this container is provided, the
76 coordinates are used to set distance constraints on the
77 embedding. The resulting conformer(s) should have distances
78 between the specified atoms that reproduce those between the
79 points in \c coordMap. Because the embedding produces a
80 molecule in an arbitrary reference frame, an alignment step
81 is required to actually reproduce the provided coordinates.
82 optimizerForceTol set the tolerance on forces in the DGeom optimizer
83 (this shouldn't normally be altered in client code).
84 ignoreSmoothingFailures try to embed the molecule even if triangle bounds
85 smoothing fails
86 enforceChirality enforce the correct chirality if chiral centers are present
87 useExpTorsionAnglePrefs impose experimental torsion-angle preferences
88 useBasicKnowledge impose "basic knowledge" terms such as flat
89 aromatic rings, ketones, etc.
90 ETversion version of the experimental torsion-angle preferences
91 verbose print output of experimental torsion-angle preferences
92 basinThresh set the basin threshold for the DGeom force field,
93 (this shouldn't normally be altered in client code).
94 onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
95 boundsMat custom bound matrix to specify upper and lower bounds of atom
96 pairs
97 embedFragmentsSeparately embed each fragment of molecule in turn
98 useSmallRingTorsions optional torsions to improve small ring conformer
99 sampling
100 useMacrocycleTorsions optional torsions to improve macrocycle conformer
101 sampling
102 useMacrocycle14config If 1-4 distances bound heuristics for
103 macrocycles is used
104 CPCI custom columbic interactions between atom pairs
105 callback void pointer to a function for reporting progress,
106 will be called with the current iteration number.
107 forceTransAmides constrain amide bonds to be trans.
108 useSymmetryForPruning use molecule symmetry when doing the RMSD pruning.
109 NOTE that for reasons of computational efficiency,
110 setting this will also set onlyHeavyAtomsForRMS to
111 true.
112 trackFailures keep track of which checks during the embedding process fail
113 failures if trackFailures is true, this is used to track the number
114 of times each embedding check fails
115 enableSequentialRandomSeeds handle the random number seeds so that
116 conformer generation can be restarted
117*/
119 unsigned int maxIterations{0};
120 int numThreads{1};
121 int randomSeed{-1};
122 bool clearConfs{true};
123 bool useRandomCoords{false};
124 double boxSizeMult{2.0};
125 bool randNegEig{true};
126 unsigned int numZeroFail{1};
127 const std::map<int, RDGeom::Point3D> *coordMap{nullptr};
128 double optimizerForceTol{1e-3};
129 bool ignoreSmoothingFailures{false};
130 bool enforceChirality{true};
131 bool useExpTorsionAnglePrefs{false};
132 bool useBasicKnowledge{false};
133 bool verbose{false};
134 double basinThresh{5.0};
135 double pruneRmsThresh{-1.0};
136 bool onlyHeavyAtomsForRMS{false};
137 unsigned int ETversion{1};
138 boost::shared_ptr<const DistGeom::BoundsMatrix> boundsMat;
139 bool embedFragmentsSeparately{true};
140 bool useSmallRingTorsions{false};
141 bool useMacrocycleTorsions{false};
142 bool useMacrocycle14config{false};
143 std::shared_ptr<std::map<std::pair<unsigned int, unsigned int>, double>> CPCI;
144 void (*callback)(unsigned int);
145 bool forceTransAmides{true};
146 bool useSymmetryForPruning{true};
147 double boundsMatForceScaling{1.0};
148 bool trackFailures{false};
149 std::vector<unsigned int> failures;
150 bool enableSequentialRandomSeeds{false};
151
152 EmbedParameters() : boundsMat(nullptr), CPCI(nullptr), callback(nullptr) {}
154 unsigned int maxIterations, int numThreads, int randomSeed,
155 bool clearConfs, bool useRandomCoords, double boxSizeMult,
156 bool randNegEig, unsigned int numZeroFail,
157 const std::map<int, RDGeom::Point3D> *coordMap, double optimizerForceTol,
158 bool ignoreSmoothingFailures, bool enforceChirality,
159 bool useExpTorsionAnglePrefs, bool useBasicKnowledge, bool verbose,
160 double basinThresh, double pruneRmsThresh, bool onlyHeavyAtomsForRMS,
161 unsigned int ETversion = 1,
162 const DistGeom::BoundsMatrix *boundsMat = nullptr,
163 bool embedFragmentsSeparately = true, bool useSmallRingTorsions = false,
164 bool useMacrocycleTorsions = false, bool useMacrocycle14config = false,
165 std::shared_ptr<std::map<std::pair<unsigned int, unsigned int>, double>>
166 CPCI = nullptr,
167 void (*callback)(unsigned int) = nullptr)
168 : maxIterations(maxIterations),
169 numThreads(numThreads),
170 randomSeed(randomSeed),
171 clearConfs(clearConfs),
172 useRandomCoords(useRandomCoords),
173 boxSizeMult(boxSizeMult),
174 randNegEig(randNegEig),
175 numZeroFail(numZeroFail),
176 coordMap(coordMap),
177 optimizerForceTol(optimizerForceTol),
178 ignoreSmoothingFailures(ignoreSmoothingFailures),
179 enforceChirality(enforceChirality),
180 useExpTorsionAnglePrefs(useExpTorsionAnglePrefs),
181 useBasicKnowledge(useBasicKnowledge),
182 verbose(verbose),
183 basinThresh(basinThresh),
184 pruneRmsThresh(pruneRmsThresh),
185 onlyHeavyAtomsForRMS(onlyHeavyAtomsForRMS),
186 ETversion(ETversion),
187 boundsMat(boundsMat),
188 embedFragmentsSeparately(embedFragmentsSeparately),
189 useSmallRingTorsions(useSmallRingTorsions),
190 useMacrocycleTorsions(useMacrocycleTorsions),
191 useMacrocycle14config(useMacrocycle14config),
192 CPCI(std::move(CPCI)),
193 callback(callback) {}
194};
195
196//! update parameters from a JSON string
198 EmbedParameters &params, const std::string &json);
199
200//! Embed multiple conformations for a molecule
202 unsigned int numConfs,
203 EmbedParameters &params);
204inline INT_VECT EmbedMultipleConfs(ROMol &mol, unsigned int numConfs,
205 EmbedParameters &params) {
207 EmbedMultipleConfs(mol, res, numConfs, params);
208 return res;
209}
210
211//! Compute an embedding (in 3D) for the specified molecule using Distance
212/// Geometry
213inline int EmbedMolecule(ROMol &mol, EmbedParameters &params) {
215 EmbedMultipleConfs(mol, confIds, 1, params);
216
217 int res;
218 if (confIds.size()) {
219 res = confIds[0];
220 } else {
221 res = -1;
222 }
223 return res;
224}
225
226//! Compute an embedding (in 3D) for the specified molecule using Distance
227/// Geometry
228/*!
229 The following operations are performed (in order) here:
230 -# Build a distance bounds matrix based on the topology, including 1-5
231 distances but not VDW scaling
232 -# Triangle smooth this bounds matrix
233 -# If step 2 fails - repeat step 1, this time without 1-5 bounds and with vdW
234 scaling, and repeat step 2
235 -# Pick a distance matrix at random using the bounds matrix
236 -# Compute initial coordinates from the distance matrix
237 -# Repeat steps 3 and 4 until maxIterations is reached or embedding is
238 successful
239 -# Adjust initial coordinates by minimizing a Distance Violation error
240 function
241 **NOTE**: if the molecule has multiple fragments, they will be embedded
242 separately,
243 this means that they will likely occupy the same region of space.
244 \param mol Molecule of interest
245 \param maxIterations Max. number of times the embedding will be tried if
246 coordinates are not obtained successfully. The default
247 value is 10x the number of atoms.
248 \param seed provides a seed for the random number generator (so that
249 the same coordinates can be obtained for a molecule on
250 multiple runs). If negative, the RNG will not be seeded.
251 \param clearConfs Clear all existing conformations on the molecule
252 \param useRandomCoords Start the embedding from random coordinates instead of
253 using eigenvalues of the distance matrix.
254 \param boxSizeMult Determines the size of the box that is used for
255 random coordinates. If this is a positive number, the
256 side length will equal the largest element of the
257 distance matrix times \c boxSizeMult. If this is a
258 negative number, the side length will equal
259 \c -boxSizeMult (i.e. independent of the elements of the
260 distance matrix).
261 \param randNegEig Picks coordinates at random when a embedding process
262 produces negative eigenvalues
263 \param numZeroFail Fail embedding if we find this many or more zero
264 eigenvalues (within a tolerance)
265 \param coordMap a map of int to Point3D, between atom IDs and their locations
266 their locations. If this container is provided, the
267 coordinates are used to set distance constraints on the
268 embedding. The resulting conformer(s) should have distances
269 between the specified atoms that reproduce those between the
270 points in \c coordMap. Because the embedding produces a
271 molecule in an arbitrary reference frame, an alignment step
272 is required to actually reproduce the provided coordinates.
273 \param optimizerForceTol set the tolerance on forces in the distgeom optimizer
274 (this shouldn't normally be altered in client code).
275 \param ignoreSmoothingFailures try to embed the molecule even if triangle
276 bounds smoothing fails
277 \param enforceChirality enforce the correct chirality if chiral centers are
278 present
279 \param useExpTorsionAnglePrefs impose experimental torsion-angle preferences
280 \param useBasicKnowledge impose "basic knowledge" terms such as flat
281 aromatic rings, ketones, etc.
282 \param verbose print output of experimental torsion-angle preferences
283 \param basinThresh set the basin threshold for the DGeom force field,
284 (this shouldn't normally be altered in client code).
285 \param onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
286 \param ETversion version of torsion preferences to use
287 \param useSmallRingTorsions optional torsions to improve small ring
288 conformer sampling
289
290 \param useMacrocycleTorsions optional torsions to improve macrocycle
291 conformer sampling \param useMacrocycle14config If 1-4 distances bound
292 heuristics for macrocycles is used \return ID of the conformations added to
293 the molecule, -1 if the emdedding failed
294*/
295inline int EmbedMolecule(
296 ROMol &mol, unsigned int maxIterations = 0, int seed = -1,
297 bool clearConfs = true, bool useRandomCoords = false,
298 double boxSizeMult = 2.0, bool randNegEig = true,
299 unsigned int numZeroFail = 1,
300 const std::map<int, RDGeom::Point3D> *coordMap = nullptr,
301 double optimizerForceTol = 1e-3, bool ignoreSmoothingFailures = false,
302 bool enforceChirality = true, bool useExpTorsionAnglePrefs = false,
303 bool useBasicKnowledge = false, bool verbose = false,
304 double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false,
305 unsigned int ETversion = 1, bool useSmallRingTorsions = false,
306 bool useMacrocycleTorsions = false, bool useMacrocycle14config = false) {
307 EmbedParameters params(
308 maxIterations, 1, seed, clearConfs, useRandomCoords, boxSizeMult,
309 randNegEig, numZeroFail, coordMap, optimizerForceTol,
310 ignoreSmoothingFailures, enforceChirality, useExpTorsionAnglePrefs,
311 useBasicKnowledge, verbose, basinThresh, -1.0, onlyHeavyAtomsForRMS,
312 ETversion, nullptr, true, useSmallRingTorsions, useMacrocycleTorsions,
313 useMacrocycle14config);
314 return EmbedMolecule(mol, params);
315};
316
317//*! Embed multiple conformations for a molecule
318/*!
319 This is kind of equivalent to calling EmbedMolecule multiple times - just that
320 the bounds
321 matrix is computed only once from the topology
322 **NOTE**: if the molecule has multiple fragments, they will be embedded
323 separately,
324 this means that they will likely occupy the same region of space.
325 \param mol Molecule of interest
326 \param res Used to return the resulting conformer ids
327 \param numConfs Number of conformations to be generated
328 \param numThreads Sets the number of threads to use (more than one thread
329 will only be used if the RDKit was build with
330 multithread
331 support). If set to zero, the max supported by the
332 system
333 will be used.
334 \param maxIterations Max. number of times the embedding will be tried if
335 coordinates are not obtained successfully. The default
336 value is 10x the number of atoms.
337 \param seed provides a seed for the random number generator (so that
338 the same coordinates can be obtained for a molecule on
339 multiple runs). If negative, the RNG will not be seeded.
340 \param clearConfs Clear all existing conformations on the molecule
341 \param useRandomCoords Start the embedding from random coordinates instead of
342 using eigenvalues of the distance matrix.
343 \param boxSizeMult Determines the size of the box that is used for
344 random coordinates. If this is a positive number, the
345 side length will equal the largest element of the
346 distance matrix times \c boxSizeMult. If this is a
347 negative number, the side length will equal
348 \c -boxSizeMult (i.e. independent of the elements of the
349 distance matrix).
350 \param randNegEig Picks coordinates at random when a embedding process
351 produces negative eigenvalues
352 \param numZeroFail Fail embedding if we find this many or more zero
353 eigenvalues (within a tolerance)
354 \param pruneRmsThresh Retain only the conformations out of 'numConfs' after
355 embedding that are at least this far apart from each
356 other. RMSD is computed on the heavy atoms.
357 Pruning is greedy; i.e. the first embedded conformation
358 is retained and from then on only those that are at
359 least
360 pruneRmsThresh away from already retained conformations
361 are kept. The pruning is done after embedding and
362 bounds violation minimization. No pruning by default.
363 \param coordMap a map of int to Point3D, between atom IDs and their locations
364 their locations. If this container is provided, the
365 coordinates are used to set distance constraints on the
366 embedding. The resulting conformer(s) should have distances
367 between the specified atoms that reproduce those between the
368 points in \c coordMap. Because the embedding produces a
369 molecule in an arbitrary reference frame, an alignment step
370 is required to actually reproduce the provided coordinates.
371 \param optimizerForceTol set the tolerance on forces in the DGeom optimizer
372 (this shouldn't normally be altered in client code).
373 \param ignoreSmoothingFailures try to embed the molecule even if triangle
374 bounds smoothing fails
375 \param enforceChirality enforce the correct chirality if chiral centers are
376 present
377 \param useExpTorsionAnglePrefs impose experimental torsion-angle preferences
378 \param useBasicKnowledge impose "basic knowledge" terms such as flat
379 aromatic rings, ketones, etc.
380 \param verbose print output of experimental torsion-angle preferences
381 \param basinThresh set the basin threshold for the DGeom force field,
382 (this shouldn't normally be altered in client code).
383 \param onlyHeavyAtomsForRMS only use the heavy atoms when doing RMS filtering
384 \param ETversion version of torsion preferences to use
385 \param useSmallRingTorsions optional torsions to improve small ring
386 conformer sampling
387
388 \param useMacrocycleTorsions optional torsions to improve macrocycle
389 conformer sampling \param useMacrocycle14config If 1-4 distances bound
390 heuristics for macrocycles is used
391
392*/
394 ROMol &mol, INT_VECT &res, unsigned int numConfs = 10, int numThreads = 1,
395 unsigned int maxIterations = 30, int seed = -1, bool clearConfs = true,
396 bool useRandomCoords = false, double boxSizeMult = 2.0,
397 bool randNegEig = true, unsigned int numZeroFail = 1,
398 double pruneRmsThresh = -1.0,
399 const std::map<int, RDGeom::Point3D> *coordMap = nullptr,
400 double optimizerForceTol = 1e-3, bool ignoreSmoothingFailures = false,
401 bool enforceChirality = true, bool useExpTorsionAnglePrefs = false,
402 bool useBasicKnowledge = false, bool verbose = false,
403 double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false,
404 unsigned int ETversion = 1, bool useSmallRingTorsions = false,
405 bool useMacrocycleTorsions = false, bool useMacrocycle14config = false) {
406 EmbedParameters params(
407 maxIterations, numThreads, seed, clearConfs, useRandomCoords, boxSizeMult,
408 randNegEig, numZeroFail, coordMap, optimizerForceTol,
409 ignoreSmoothingFailures, enforceChirality, useExpTorsionAnglePrefs,
410 useBasicKnowledge, verbose, basinThresh, pruneRmsThresh,
411 onlyHeavyAtomsForRMS, ETversion, nullptr, true, useSmallRingTorsions,
412 useMacrocycleTorsions, useMacrocycle14config);
413 EmbedMultipleConfs(mol, res, numConfs, params);
414};
415//! \overload
417 ROMol &mol, unsigned int numConfs = 10, unsigned int maxIterations = 30,
418 int seed = -1, bool clearConfs = true, bool useRandomCoords = false,
419 double boxSizeMult = 2.0, bool randNegEig = true,
420 unsigned int numZeroFail = 1, double pruneRmsThresh = -1.0,
421 const std::map<int, RDGeom::Point3D> *coordMap = nullptr,
422 double optimizerForceTol = 1e-3, bool ignoreSmoothingFailures = false,
423 bool enforceChirality = true, bool useExpTorsionAnglePrefs = false,
424 bool useBasicKnowledge = false, bool verbose = false,
425 double basinThresh = 5.0, bool onlyHeavyAtomsForRMS = false,
426 unsigned int ETversion = 1, bool useSmallRingTorsions = false,
427 bool useMacrocycleTorsions = false, bool useMacrocycle14config = false) {
428 EmbedParameters params(
429 maxIterations, 1, seed, clearConfs, useRandomCoords, boxSizeMult,
430 randNegEig, numZeroFail, coordMap, optimizerForceTol,
431 ignoreSmoothingFailures, enforceChirality, useExpTorsionAnglePrefs,
432 useBasicKnowledge, verbose, basinThresh, pruneRmsThresh,
433 onlyHeavyAtomsForRMS, ETversion, nullptr, true, useSmallRingTorsions,
434 useMacrocycleTorsions, useMacrocycle14config);
436 EmbedMultipleConfs(mol, res, numConfs, params);
437 return res;
438};
439
440//! Parameters corresponding to Sereina Riniker's KDG approach
441RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters KDG;
442//! Parameters corresponding to Sereina Riniker's ETDG approach
443RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters ETDG;
444//! Parameters corresponding to Sereina Riniker's ETKDG approach
445RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters ETKDG;
446//! Parameters corresponding to Sereina Riniker's ETKDG approach - version 2
447RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters ETKDGv2;
448//! Parameters corresponding improved ETKDG by Wang, Witek, Landrum and Riniker
449//! (10.1021/acs.jcim.0c00025) - the macrocycle part
450RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters ETKDGv3;
451//! Parameters corresponding improved ETKDG by Wang, Witek, Landrum and Riniker
452//! (10.1021/acs.jcim.0c00025) - the small ring part
453RDKIT_DISTGEOMHELPERS_EXPORT extern const EmbedParameters srETKDGv3;
454} // namespace DGeomHelpers
455} // namespace RDKit
456
457#endif
Defines the primary molecule class ROMol as well as associated typedefs.
Class to store the distance bound.
#define RDKIT_DISTGEOMHELPERS_EXPORT
Definition export.h:121
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters ETKDGv2
Parameters corresponding to Sereina Riniker's ETKDG approach - version 2.
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters ETDG
Parameters corresponding to Sereina Riniker's ETDG approach.
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters ETKDGv3
RDKIT_DISTGEOMHELPERS_EXPORT void updateEmbedParametersFromJSON(EmbedParameters &params, const std::string &json)
update parameters from a JSON string
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters ETKDG
Parameters corresponding to Sereina Riniker's ETKDG approach.
RDKIT_DISTGEOMHELPERS_EXPORT void EmbedMultipleConfs(ROMol &mol, INT_VECT &res, unsigned int numConfs, EmbedParameters &params)
Embed multiple conformations for a molecule.
int EmbedMolecule(ROMol &mol, EmbedParameters &params)
Definition Embedder.h:213
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters srETKDGv3
RDKIT_DISTGEOMHELPERS_EXPORT const EmbedParameters KDG
Parameters corresponding to Sereina Riniker's KDG approach.
Std stuff.
std::vector< int > INT_VECT
Definition types.h:281
bool rdvalue_is(const RDValue_cast_t)
Parameter object for controlling embedding.
Definition Embedder.h:118
EmbedParameters(unsigned int maxIterations, int numThreads, int randomSeed, bool clearConfs, bool useRandomCoords, double boxSizeMult, bool randNegEig, unsigned int numZeroFail, const std::map< int, RDGeom::Point3D > *coordMap, double optimizerForceTol, bool ignoreSmoothingFailures, bool enforceChirality, bool useExpTorsionAnglePrefs, bool useBasicKnowledge, bool verbose, double basinThresh, double pruneRmsThresh, bool onlyHeavyAtomsForRMS, unsigned int ETversion=1, const DistGeom::BoundsMatrix *boundsMat=nullptr, bool embedFragmentsSeparately=true, bool useSmallRingTorsions=false, bool useMacrocycleTorsions=false, bool useMacrocycle14config=false, std::shared_ptr< std::map< std::pair< unsigned int, unsigned int >, double > > CPCI=nullptr, void(*callback)(unsigned int)=nullptr)
Definition Embedder.h:153
std::vector< unsigned int > failures
Definition Embedder.h:149
boost::shared_ptr< const DistGeom::BoundsMatrix > boundsMat
Definition Embedder.h:138
std::shared_ptr< std::map< std::pair< unsigned int, unsigned int >, double > > CPCI
Definition Embedder.h:143