Validate command

The validate command performs a validation off a trained model or by a TCP predictor by using a file with known labels. The predictor can thus be evaluated to see how good it performs on unknown data. For TCP the input can either be precomputed data or molecule files that will be converted into sparse data on the fly.

Parameters

The full usage menu can be retrieved by running command:

> java -jar cpsign-[version].jar validate -h

                                             validate
SYNOPSIS
----------------------------------------------------------------------------------------------------
   validate [options]
   validate @/tmp/runconfigs/parameters.txt [options]
   validate @C:\Users\User\runconfigs\parameters.txt [options]


DESCRIPTION
----------------------------------------------------------------------------------------------------
   Use a test-file with existing (true) values to validate a Conformal Predictor. The normal
   execution will only report overall statistics, but it is possible to print all predicted result to
   json, smiles or sdf file format.


OPTIONS
----------------------------------------------------------------------------------------------------
   Input options:
      -m, --modelfile  [URI] or [path]
         (ACP/CCP) Existing trained model files (libsvm/liblinear format)
         (TCP) Existing precomputed data
      -t, --trainfile  [URI] or [path]
         (TCP) Training file in SDF or SMILES format
      -rn, --response-name  [text]
         (SDFile) Name of response value to model, should match a property in the train file
         (SMILES file) Name of the column to model, should match header of that column
      -l, --labels  [label1 label2] or [label1, label2]
         Label(s) for response values in classification mode. If a label is a negative numerical
         number, the minus sign must be escaped so that the command parser does not think it's a new
         option flag. E.g.: --labels [-1,1] (no blank-space permitted!) or --labels "\-1" 1

   Prediction options:
    * -co, --confidences  [confidence1 confidence2 ..] or [confidence1, confidence2, ..]
         Confidences for predictions (e.g. '0.5,0.7,0.9' or '0.5 0.7 0.9'). Should be in the range
         [0,1]
    * -p, --predictfile  [URI] or [path]
         File to use for validation. Accepted formats are SMILES-file (with either the true value in
         second column or indicating the correct value with the -vp [--validation-property] flag),
         SDFfile
      -vp, --validation-property  [text]
         (SDFile) Name of field with true value, should match a property in the predict file
         (SMILES file) Name of the column to use for validation, should match header of that column

   Modeling options:
    * -c, --cptype  [integer]
         Model type: 1) ACP classification, 2) ACP regression, 3) CCP classification, 4) CCP
         regression, 5) TCP classification

   (TCP only) modeling options:
      --cost  [number]
         User defined Cost value in SVM training
      --gamma  [number]
         User defined Gamma value in SVM training (only used in libsvm)
      --epsilon  [number]
         User defined Epsilon value in SVM training

   (TCP only) Signature generation options:
      -hs, --height-start  [integer]
         Signatures start height
         Default: 1
      -he, --height-end  [integer]
         Signatures end height
         Default: 3
      -sg, --signatures-generator  [text]
         Type of signatures that should be used, note that stereo-signatures take much longer time to
         compute. Options:
           normal (default)
           stereo (experimental mode)
         Default: default

   Encryption options:
      --two-factor-pin
         If two-factor encryption is used and key has a non-default PIN

   Printing predictions options:
      -P, --print
         Print the prediction output in json/smiles/sdf format (default is only printing overall
         statistics)
         Default: false
      -of, --output-format  [text value]
         output format of predictions, options:
          json
          smiles/plain
          sdf
         Default: json
      -o, --output  [path]
         File to write prediction output to (default is printing to screen)
      --compress
         If the outputfile should be compressed
         Default: false

   General options:
    * --license  [path]
         Path to license file
      --logfile  [path]
         Path to a user set logfile, will be specific for this run
      --silent
         Silent mode (only print output to logfile)
         Default: false
      --echo
         Echo the input arguments given to CPSign
         Default: false
      -h, --help
         Get help for this command
         Default: false
      --time
         Print wall-time for all individual steps in execution
         Default: false

Example Usage

Example (ACP Regression):

> java -jar cpsign-[version].jar crossvalidate \
   --license /path/to/Standard-license.license \
   -c 2 \
   -vp BIO \
   -p /path/to/validatefile.sdf \
   -co 0.7 0.8 \
   -m /path/to/model.cpsign

Running with Standard license: License registered to: [Name] [Company] . Expiry date is: [Date]

Loading model from file..
 - Loaded model 1/5
 - Loaded model 2/5
 - Loaded model 3/5
 - Loaded model 4/5
 - Loaded model 5/5
Finished loading model

Starting doing predictions
Successfully predicted 34 molecules

====================================================================================================

Validation result for confidence level set to 0.7:
 - Accuracy: 1.0
 - RMSE: 0.088
 - R^2: 1.0
 - Median Prediction Interval Width: 5.186


Validation result for confidence level set to 0.8:
 - Accuracy: 1.0
 - RMSE: 0.088
 - R^2: 1.0
 - Median Prediction Interval Width: 10.186