Tune command

The tune command is used for parameter optimization of the Support Vector Machine parameters C and gamma. The standard options used in CPSign are normally good when using the signatures descriptors in SVM problems, but here you can optionally run tuning of the parameters.


The full usage menu can be retrieved by running command:

> java -jar cpsign-[version].jar tune -h

   tune [options]
   tune @/tmp/runconfigs/parameters.txt [options]
   tune @C:\Users\User\runconfigs\parameters.txt [options]

   Perform an exhaustive grid search to find optimal parameter values for Cost and Gamma. For
   regression problems using the log-normalized nonconformity measure, it is also possible to grid
   search for optimizing the beta-parameter of the nonconformity measure.

   Input options:
    * -t, --trainfile  [URI] or [path]
         Training file in SDF or SMILES format
      -rn, --response-name  [text]
         (SDFile) Name of response value to model, should match a property in the train file
         (SMILES file) Name of the column to model, should match header of that column
      -l, --labels  [label1 label2] or [label1,label2]
         Label(s) for response values in classification mode. If a label is a negative numerical
         number, the minus sign must be escaped so that the command parser does not think it's a new
         option flag. E.g.: --labels [-1,1] (no blank-space permitted!) or --labels "\-1" 1

   Modeling options:
    * -c, --cptype  [integer]
         Model type: 1) ACP classification, 2) ACP regression, 3) CCP classification, 4) CCP
      -i, --impl  [text value]
         Options: liblinear or libsvm
         Default: liblinear
      --nonconf-measure  [text value]
         Nonconformity score that should be used, see documentation for clarifications
         (Regression) Options: abs-diff, normalized or log-normalized
         Default: default
      --nonconf-beta  [decimal number]
         If log-normalized nonconformity score is chosen, optionally set a beta value (>= 0)
         Default: 0.0

   Grid Search options:
      -op, --optimization  [text value]
         The criterion that should be used for optimizing the parameters, Options: efficiency,
         validity and rmse (only for regression)
         Default: efficiency
      --gamma-range  [start:step:end]
         The range of gamma values that should be used, specified as 'start:step:end' (only integers
         allowed). The values tested will be {2^start, 2^(start+step),..,2^end}
         Default: -5:2:3
      --cost-range  [start:step:end]
         The range of cost values that should be used, specified as 'start:step:end' (only integers
         allowed). The values tested will be {2^start, 2^(start+step),..,2^end}
         Default: -5:2:15
      --beta-values  [decimal value, decimal value, ..] or [decimal value decimal value ..]
         (Regression) If log-normalized nonconformity measure is used, tune the beta value by giving a
         list of values that should be tested. Beta values must be >= 0

   Cross Validation options:
    * -k, --cv-folds  [integer]
         Number of folds in the cross validation (min 2 folds)
         Default: 0
      -co, --confidence  [decimal number]
         Confidence used in the cross validation, range [0,1]
         Default: 0.8
      -nr, --nr-models  [integer]
         Number of ACP models or CCP folds (min 1 for ACP and min 2 for CCP)
         Default: 1
      -cr, --calibration-ratio  [decimal number]
         (ACP) Part of training set used as calibration set, range (0,1)
         Default: 0.2
      --seed  [integer]
             Set this flag if an explicit seed should be used in randomization of training data, default
             is using a random seed

   Signature generation options:
      -hs, --height-start  [integer]
         Signatures start height
         Default: 1
      -he, --height-end  [integer]
         Signatures end height
         Default: 3
      -sg, --signatures-generator  [text]
         Type of signatures that should be used, note that stereo-signatures take much longer time to
         compute. Options:
           normal (default)
           stereo (experimental mode)
         Default: default

   Output options:
      -a, --all
         Print all grid search results (otherwise will just print the optimal result)
         Default: false
      -o, --output  [path]
         File to write all grid search results to

   General options:
    * --license  [path]
         Path to license file
      --logfile  [path]
         Path to a user set logfile, will be specific for this run
         Silent mode (only print output to logfile)
         Default: false
         Echo the input arguments given to CPSign
         Default: false
      -h, --help
         Get help for this command
         Default: false
         Print wall-time for all individual steps in execution
         Default: false

Picking parameter search space

In case a larger than normal parameter space would like to be searched, it is possible to set the ranges of the C and gamma values. The parameter ranges are configured by three values start, step and end given to the flags --gamma-range and --cost-range. The parameter values that finally will be tried are the set {2^start, 2^(start+step),..,2^end}. When searching a large parameter space, it is possible to do a coarse-grained search by setting a larger step size, when the region of interest has been found, lower the step size and do a fine grained search. Currently only availiable in ACP and CCP modes, but the parameters obtained in ACP/CCP is transferable to the TCP case.

Parameter tuning β

The smoothing factor, β, of the logarithmically normalized nonconformity measure introduced in Nonconformity measures can be optimized with the tune command. This is done slightly different than with the C and gamma values, here you can simply add a list of β values that you wish to test (given that you have set the logarithmically normalized nonconformity measure in a regression case):

> java -jar cpsign-[version].jar tune \
  --beta-values 0.0 0.1 0.2 0.5 \