![]() |
RDKit
Open-source cheminformatics and machine learning.
|
Diversity picker based on hierarchical clustering. More...
#include <HierarchicalClusterPicker.h>
Public Types | |
enum | ClusterMethod { WARD =1, SLINK =2, CLINK =3, UPGMA =4, MCQUITTY =5, GOWER =6, CENTROID =7 } |
The type of hierarchical clustering algorithm to use. More... | |
Public Member Functions | |
HierarchicalClusterPicker (ClusterMethod clusterMethod) | |
Constructor - takes a ClusterMethod as an argument. More... | |
RDKit::INT_VECT | pick (const double *distMat, unsigned int poolSize, unsigned int pickSize) const |
This is the function that does the picking. More... | |
RDKit::VECT_INT_VECT | cluster (const double *distMat, unsigned int poolSize, unsigned int pickSize) const |
This is the function that does the clustering of the items - used by the picker. More... | |
![]() | |
DistPicker () | |
Default constructor. More... | |
virtual | ~DistPicker () |
Diversity picker based on hierarchical clustering.
This class inherits from DistPicker since it uses the distance matrix for diversity picking. The clustering itself is done using the Murtagh code in $RDBASE/Code/ML/Cluster/Mutagh/
Definition at line 24 of file HierarchicalClusterPicker.h.
The type of hierarchical clustering algorithm to use.
Enumerator | |
---|---|
WARD | |
SLINK | |
CLINK | |
UPGMA | |
MCQUITTY | |
GOWER | |
CENTROID |
Definition at line 29 of file HierarchicalClusterPicker.h.
|
inlineexplicit |
Constructor - takes a ClusterMethod as an argument.
Sets the hierarch clustering method
Definition at line 42 of file HierarchicalClusterPicker.h.
RDKit::VECT_INT_VECT RDPickers::HierarchicalClusterPicker::cluster | ( | const double * | distMat, |
unsigned int | poolSize, | ||
unsigned int | pickSize | ||
) | const |
This is the function that does the clustering of the items - used by the picker.
ARGUMENTS:
distMat | - distance matrix - a vector of double. It is assumed that only the lower triangle element of the matrix are supplied in a 1D array NOTE: this matrix WILL BE ALTERED during the picking |
poolSize | - the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1) |
pickSize | - the number clusters to divide the pool into (<= poolSize) |
Referenced by HierarchicalClusterPicker().
|
virtual |
This is the function that does the picking.
Here is how the algorithm works
FIX: Supply reference
For each item in a cluster the sum of square of the distances to the rest of of the items (in the cluster) is computed. The item with the smallest of values is picked as a representative of the cluster. Basically trying to pick the item closest to the centroid of the cluster.
distMat | - distance matrix - a vector of double. It is assumed that only the lower triangle element of the matrix are supplied in a 1D array NOTE: this matrix WILL BE ALTERED during the picking |
poolSize | - the size of the pool to pick the items from. It is assumed that the distance matrix above contains the right number of elements; i.e. poolSize*(poolSize-1) |
pickSize | - the number items to pick from pool (<= poolSize) |
Implements RDPickers::DistPicker.
Referenced by HierarchicalClusterPicker().