Trees | Indices | Help |
|
---|
|
command line utility for screening composite models **Usage** _ScreenComposite [optional args] modelfile(s) datafile_ Unless indicated otherwise (via command line arguments), _modelfile_ is a file containing a pickled composite model and _filename_ is a QDAT file. **Command Line Arguments** - -t *threshold value(s)*: use high-confidence predictions for the final analysis of the hold-out data. The threshold value can be either a single float or a list/tuple of floats. All thresholds should be between 0.0 and 1.0 - -D: do a detailed screen. - -d *database name*: instead of reading the data from a QDAT file, pull it from a database. In this case, the _datafile_ argument provides the name of the database table containing the data set. - -N *note*: use all models from the database which have this note. The modelfile argument should contain the name of the table with the models. - -H: screen only the hold out set (works only if a version of BuildComposite more recent than 1.2.2 was used). - -T: screen only the training set (works only if a version of BuildComposite more recent than 1.2.2 was used). - -E: do a detailed Error analysis. This shows each misclassified point and the number of times it was missed across all screened composites. If the --enrich argument is also provided, only compounds that have true activity value equal to the enrichment value will be used. - --enrich *enrichVal*: target "active" value to be used in calculating enrichments. - -A: show All predictions. - -S: shuffle activity values before screening - -R: randomize activity values before screening - -F *filter frac*: filters the data before training to change the distribution of activity values in the training set. *filter frac* is the fraction of the training set that should have the target value. **See note in BuildComposite help about data filtering** - -v *filter value*: filters the data before training to change the distribution of activity values in the training set. *filter value* is the target value to use in filtering. **See note in BuildComposite help about data filtering** - -V: be verbose when screening multiple models - -h: show this message and exit - --OOB: Do out an "out-of-bag" generalization error estimate. This only makes sense when applied to the original data set. - --pickleCol *colId*: index of the column containing a pickled value (used primarily for cases where fingerprints are used as descriptors) *** Options for making Prediction (Hanneke) Plots *** - --predPlot=<fileName>: triggers the generation of a Hanneke plot and sets the name of the .txt file which will hold the output data. A Gnuplot control file, <fileName>.gnu, will also be generated. - --predActTable=<name> (optional): name of the database table containing activity values. If this is not provided, activities will be read from the same table containing the screening data - --predActCol=<name> (optional): name of the activity column. If not provided, the name of the last column in the activity table will be used. - --predLogScale (optional): If provided, the x axis of the prediction plot (the activity axis) will be plotted using a log scale - --predShow: launch a gnuplot instance and display the prediction plot (the plot will still be written to disk). *** The following options are likely obsolete *** - -P: read pickled data. The datafile argument should contain a pickled data set. *relevant only to qdat files* - -q: data are not quantized (the composite should take care of quantization itself if it requires quantized data). *relevant only to qdat files*
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
hasPil = 1
|
|||
_details = CompositeRun.CompositeRun()
|
|||
__VERSION_STRING = "3.3.0"
|
Imports: sys, copy, numpy, cPickle, RDConfig, DataStructs, Image, ImageDraw, DataUtils, SplitData, CompositeRun, DbConnect, DbModule
|
emits messages to _sys.stdout_ override this in modules which import this one to redirect output **Arguments** - msg: the string to be displayed |
emits messages to _sys.stderr_ override this in modules which import this one to redirect output **Arguments** - msg: the string to be displayed |
screens a set of examples through a composite and returns the results #DOC **Arguments** - examples: the examples to be screened (a sequence of sequences) it's assumed that the last element in each example is it's "value" - composite: the composite model to be used - callback: (optional) if provided, this should be a function taking a single argument that is called after each example is screened with the number of examples screened so far as the argument. - appendExamples: (optional) this value is passed on to the composite's _ClassifyExample()_ method. - errorEstimate: (optional) calculate the "out of bag" error estimate for the composite using Breiman's definition. This only makes sense when screening the original data set! [L. Breiman "Out-of-bag Estimation", UC Berkeley Dept of Statistics Technical Report (1996)] **Returns** a list of 3-tuples _nExamples_ long: 1) answer: the value from the example 2) pred: the composite model's prediction 3) conf: the confidence of the composite |
screens a set of examples cross a composite and breaks the predictions into *correct*,*incorrect* and *unclassified* sets. #DOC **Arguments** - examples: the examples to be screened (a sequence of sequences) it's assumed that the last element in each example is its "value" - composite: the composite model to be used - threshold: (optional) the threshold to be used to decide whether or not a given prediction should be kept - screenResults: (optional) the results of screening the results (a sequence of 3-tuples in the format returned by _CollectResults()_). If this is provided, the examples will not be screened again. - goodVotes,badVotes,noVotes: (optional) if provided these should be lists (or anything supporting an _append()_ method) which will be used to pass the screening results back. - callback: (optional) if provided, this should be a function taking a single argument that is called after each example is screened with the number of examples screened so far as the argument. - appendExamples: (optional) this value is passed on to the composite's _ClassifyExample()_ method. - errorEstimate: (optional) calculate the "out of bag" error estimate for the composite using Breiman's definition. This only makes sense when screening the original data set! [L. Breiman "Out-of-bag Estimation", UC Berkeley Dept of Statistics Technical Report (1996)] **Notes** - since this function doesn't return anything, if one or more of the arguments _goodVotes_, _badVotes_, and _noVotes_ is not provided, there's not much reason to call it |
screens the results and shows a detailed workup The work of doing the screening and processing the results is handled by _DetailedScreen()_ #DOC **Arguments** - examples: the examples to be screened (a sequence of sequences) it's assumed that the last element in each example is its "value" - composite: the composite model to be used - nResultCodes: the number of possible results the composite can return - threshold: the threshold to be used to decide whether or not a given prediction should be kept - screenResults: (optional) the results of screening the results (a sequence of 3-tuples in the format returned by _CollectResults()_). If this is provided, the examples will not be screened again. - callback: (optional) if provided, this should be a function taking a single argument that is called after each example is screened with the number of examples screened so far as the argument. - appendExamples: (optional) this value is passed on to the composite's _ClassifyExample()_ method. - goodVotes,badVotes,noVotes: (optional) if provided these should be lists (or anything supporting an _append()_ method) which will be used to pass the screening results back. - errorEstimate: (optional) calculate the "out of bag" error estimate for the composite using Breiman's definition. This only makes sense when screening the original data set! [L. Breiman "Out-of-bag Estimation", UC Berkeley Dept of Statistics Technical Report (1996)] **Returns** a 7-tuple: 1) the number of good (correct) predictions 2) the number of bad (incorrect) predictions 3) the number of predictions skipped due to the _threshold_ 4) the average confidence in the good predictions 5) the average confidence in the bad predictions 6) the average confidence in the skipped predictions 7) the results table |
screens a set of data using a composite model and prints out statistics about the screen. #DOC The work of doing the screening and processing the results is handled by _DetailedScreen()_ **Arguments** - composite: the composite model to be used - data: the examples to be screened (a sequence of sequences) it's assumed that the last element in each example is its "value" - partialVote: (optional) toggles use of the threshold value in the screnning. - voteTol: (optional) the threshold to be used to decide whether or not a given prediction should be kept - verbose: (optional) sets degree of verbosity of the screening - screenResults: (optional) the results of screening the results (a sequence of 3-tuples in the format returned by _CollectResults()_). If this is provided, the examples will not be screened again. - goodVotes,badVotes,noVotes: (optional) if provided these should be lists (or anything supporting an _append()_ method) which will be used to pass the screening results back. **Returns** a 7-tuple: 1) the number of good (correct) predictions 2) the number of bad (incorrect) predictions 3) the number of predictions skipped due to the _threshold_ 4) the average confidence in the good predictions 5) the average confidence in the bad predictions 6) the average confidence in the skipped predictions 7) None |
*Internal Use Only* converts a list of 4 tuples: (answer,prediction,confidence,idx) into an alternate list: (answer,prediction,confidence,data point) **Arguments** - votes: a list of 4 tuples: (answer, prediction, confidence, index) - data: a _DataUtils.MLData.MLDataSet_ **Note**: alterations are done in place in the _votes_ list |
Screens a set of data using a a _CompositeRun.CompositeRun_ instance to provide parameters # DOC The actual data to be used are extracted from the database and table specified in _details_ Aside from dataset construction, _ShowVoteResults()_ does most of the heavy lifting here. **Arguments** - model: a composite model - details: a _CompositeRun.CompositeRun_ object containing details (options, parameters, etc.) about the run - callback: (optional) if provided, this should be a function taking a single argument that is called after each example is screened with the number of examples screened so far as the argument. - setup: (optional) a function taking a single argument which is called at the start of screening with the number of points to be screened as the argument. - appendExamples: (optional) this value is passed on to the composite's _ClassifyExample()_ method. - goodVotes,badVotes,noVotes: (optional) if provided these should be lists (or anything supporting an _append()_ method) which will be used to pass the screening results back. **Returns** a 7-tuple: 1) the number of good (correct) predictions 2) the number of bad (incorrect) predictions 3) the number of predictions skipped due to the _threshold_ 4) the average confidence in the good predictions 5) the average confidence in the bad predictions 6) the average confidence in the skipped predictions 7) the results table |
returns the text of a web page showing the screening details #DOC **Arguments** - nGood: number of correct predictions - nBad: number of incorrect predictions - nRej: number of rejected predictions - avgGood: average correct confidence - avgBad: average incorrect confidence - avgSkip: average rejected confidence - voteTable: vote table - imgDir: (optional) the directory to be used to hold the vote image (if constructed) **Returns** a string containing HTML |
**Arguments** - details: a CompositeRun.RunDetails object - indices: a sequence of integer indices into _data_ - data: the data set in question. We assume that the ids for the data points are in the _idCol_ column - goodVotes/badVotes: predictions where the model was correct/incorrect. These are sequences of 4-tuples: (answer,prediction,confidence,index into _indices_) |
prints a list of arguments for when this is used from the command line and then exits |
Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Sat Apr 23 18:49:15 2016 | http://epydoc.sourceforge.net |