Description of data files in this folder¶
Solubility dataset¶
solubility.test.sdf (257 records)
solubility.train.sdf (1025 records)
The two sdf files(hereby named “solubility dataset”) are originated from the Huuskonen dataset. The Huuskonen dataset contains a training set of 884 compounds and a randomly chosen test set of 413 compounds.
Reference: Huuskonen, J. (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. Journal of Chemical Information and Computer Sciences, 40(3), 773–777. https://doi.org/10.1021/ci9901338
This solubility dataset is originally downloaded from
http://cheminformatics.org/datasets/huuskonen/index.html
Although cheminformatics.org no longer exists, supplementary file from https://doi.org/10.1021/ci9901338 contains a list of all the structures and the corresponding data in PDF format.