Description of data files in this folder

Solubility dataset

  • solubility.test.sdf (257 records)

  • solubility.train.sdf (1025 records)

The two sdf files(hereby named “solubility dataset”) are originated from the Huuskonen dataset. The Huuskonen dataset contains a training set of 884 compounds and a randomly chosen test set of 413 compounds.

  • Reference: Huuskonen, J. (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. Journal of Chemical Information and Computer Sciences, 40(3), 773–777. https://doi.org/10.1021/ci9901338

This solubility dataset is originally downloaded from

  • http://cheminformatics.org/datasets/huuskonen/index.html

Although cheminformatics.org no longer exists, supplementary file from https://doi.org/10.1021/ci9901338 contains a list of all the structures and the corresponding data in PDF format.