Crossvalidate command

Crossvalidatation can be performed with ACP or CCP, in both regression and classification. It will perform a k-fold crossvalidation using -k number of folds.

Parameters

The full usage menu can be retrieved by running command:

> java -jar cpsign-[version].jar crossvalidate -h

                                           crossvalidate
SYNOPSIS
----------------------------------------------------------------------------------------------------
   crossvalidate [options]
   crossvalidate @/tmp/runconfigs/parameters.txt [options]
   crossvalidate @C:\Users\User\runconfigs\parameters.txt [options]

DESCRIPTION
----------------------------------------------------------------------------------------------------
   Command crossvalidate performs a k-fold cross validation of the given dataset. This give an
   estimate on how good predictions will be given this dataset and these settings.


OPTIONS
----------------------------------------------------------------------------------------------------
   Input options:
    * -t, --trainfile  [URI] or [path]
         Training file in SDF or SMILES format
      -rn, --response-name  [URI] or [path]
         (SDFile) Name of response value to model, should match a property in the train file
         (SMILES file) Name of the column to model, should match header of that column
      -l, --labels  [label1 label2] or [label1,label2]
         Label(s) for response values in classification mode. If a label is a negative numerical
         number, the minus sign must be escaped so that the command parser does not think it's a new
         option flag. E.g.: --labels [-1,1] (no blank-space permitted!) or --labels "\-1" 1

   Modeling options:
    * -c, --cptype  [integer]
         Model type: 1) ACP classification, 2) ACP regression, 3) CCP classification, 4) CCP
         regression
      -i, --impl
         Options: liblinear or libsvm
         Default: liblinear
      --cost  [number]
         User defined Cost value in SVM training
      --gamma  [number]
         User defined Gamma value in SVM training (only used in libsvm)
      --epsilon  [number]
         User defined Epsilon value in SVM training
      --nonconf-measure  [text value]
         Nonconformity score that should be used, see documentation for clarifications
         (Regression) Options: abs-diff, normalized or log-normalized
         Default: default
      --nonconf-beta  [decimal number]
         If log-normalized nonconformity score is chosen, optionally set a beta value (>= 0)
         Default: 0.0

   Cross Validation options:
    * -k, --cv-folds  [integer]
         Number of folds in cross validation (min 2 folds)
         Default: 0
      -co, --confidence  [decimal number]
         Confidence used in cross validation (range [0,1])
         Default: 0.8
      -nr, --nr-models  [integer]
         Number of ACP models or CCP folds (min 1 for ACP and min 2 for CCP)
         Default: 1
      -cr, --calibration-ratio  [decimal number]
         (ACP) Part of training set used as calibration set, range (0,1)
         Default: 0.2
      --seed  [integer]
         Set this flag if an explicit seed should be used in randomization of training data, default
         is using a random seed

   Signature generation options:
      -hs, --height-start  [integer]
         Signatures start height
         Default: 1
      -he, --height-end  [integer]
         Signatures end height
         Default: 3
      -sg, --signatures-generator  [text]
         Type of signatures that should be used, note that stereo-signatures take much longer time to
         compute. Options:
           normal (default)
           stereo (experimental mode)
         Default: default

   General options:
    * --license  [path]
         Path to license file
      --logfile  [path]
         Path to a user set logfile, will be specific for this run
      --silent
         Silent mode (only print output to logfile)
         Default: false
      --echo
         Echo the input arguments given to CPSign
         Default: false
      -h, --help
         Get help for this command
         Default: false
      --time
         Print wall-time for all individual steps in execution
         Default: false

Example Usage

Example (ACP classification):

> java -jar cpsign-[version].jar crossvalidate \
   --license /path/to/Standard-license.license \
   -c 1 \
   -t /path/to/datafile.sdf \
   -rn Activity \
   -l POS, NEG \
   -k 5

Running with Standard license: License registered to: [Name] [Company] . Expiry date is: [Date]

Reading train file and performing signature generation..
Parsed: 964 molecules from SDFile. Detected labels: 'POS'=379, 'NEG'=585

Cross validation finished with the following stats:
Efficiency=0.192
Validity=0.808

Example (CCP regression):

> java -jar cpsign-[version].jar crossvalidate \
   --license /path/to/Standard-license.license \
   -c 2 \
   -t /path/to/datafile.sdf \
   -rn BIO \
   --cv-folds 5


Running with Standard license: License registered to: [Name] [Company] . Expiry date is: [Done]

Reading train file and performing signature generation..
Parsed: 34 molecules from SDFile.

Cross validation finished with the following stats:
Efficiency=10.11
RMSE=5.534
Validity=0.714