# Description of data files in this folder
## Solubility dataset - solubility.test.sdf (257 records) - solubility.train.sdf (1025 records)
The two sdf files(hereby named “solubility dataset”) are originated from the Huuskonen dataset. The Huuskonen dataset contains a training set of 884 compounds and a randomly chosen test set of 413 compounds. - Reference: Huuskonen, J. (2000). Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. Journal of Chemical Information and Computer Sciences, 40(3), 773–777. https://doi.org/10.1021/ci9901338
This solubility dataset is originally downloaded from - http://cheminformatics.org/datasets/huuskonen/index.html
Although cheminformatics.org no longer exists, supplementary file from https://doi.org/10.1021/ci9901338 contains a list of all the structures and the corresponding data in PDF format.