Validate

The validate program performs a validation off a trained model. The predictor can thus be evaluated to see how good it performs on unknown data.

Table of Contents

Parameters

The full usage menu can be retrieved by running command:

> java -jar cpsign-[version].jar validate -h

                                         validate
SYNOPSIS
------------------------------------------------------------------------------------------
  validate [options]
  validate @/tmp/runconfigs/parameters.txt [options]
  validate @C:\Users\User\runconfigs\parameters.txt [options]


DESCRIPTION
------------------------------------------------------------------------------------------
  Use a test-file with existing (true) labels to validate a Predictor. The normal
  execution will only report overall statistics, but it is possible to print all predicted
  result to json, smiles or sdf file format.


OPTIONS
------------------------------------------------------------------------------------------
  Input:
  * -mi | --model-in                         [URI | path]
       Trained CPSign model
  * -p  | --predictfile                      [URI | path]
       File to use for validation. Accepted formats are SMILES-file (with either the true
       value in second column or indicating the correct value with the -vp
       [--validation-property] flag), SDFfile or JSON
    -ve | --validation-endpoint              [text]
       (SDFile) Name of field with true label, should match a property in the predict file
       (SMILES) Name of the column to use for validation, should match header of that
       column
       (JSON) JSON-key for the property with the true response value

  Validation:
    -co | --confidences           [confidence confidence .. ] | [confidence,confidence,..]
       Confidences for predictions (e.g. '0.5,0.7,0.9' or '0.5 0.7 0.9'). Should be in the
       range [0,1]
       Default: 0.8

  Output:
    --print
       Print the prediction output in json/smiles/sdf format (default is only printing
       overall statistics)
    -of | --output-format                    [text]
       Output format of predictions (only applicable if the --print flag is given),
       options:
         (1) json
         (2) smiles | plain
         (3) sdf | sdf-v2000
         (4) sdf-v3000
       Default: 1
    -o  | --output                           [path]
       File to write prediction output to (default is printing to screen). Giving this
       parameter sets the --print flag to true
    --output-inchi
       Generate InChI and InChIKey in the output
    --compress
       If the outputfile should be compressed (only possible when writing to file)

  Encryption:
    --two-factor-pin
       If two-factor encryption is used and key has a non-default PIN

  General:
  * --license                                [URI | path]
       Path or URI to license file
    -h  | --help | man
       Get help text
    --short
       Use shorter help text (used together with the --help argument)
    --logfile                                [path]
       Path to a user-set logfile, will be specific for this run
    --silent
       Silent mode (only print output to logfile)
    --echo
       Echo the input arguments given to CPSign
    --time
       Print wall-time for all individual steps in execution

------------------------------------------------------------------------------------------

Example Usage

Example (ACP Classification):

> java -jar cpsign-[version].jar validate \
   --license /path/to/Standard-license.license \
   --validation-endpoint "Ames test categorisation" \
   -p /path/to/validatefile.sdf \
   -co 0.7 0.8 0.9 \
   -mi /path/to/model.cpsign

Running with Standard License registered to [Name] at [Company]. Expiry
date is [Date]

Loading model..
Loaded an ACP classification predictor with 2 aggregated models. Model has been trained
from 123 training examples. The model endpoint is 'Ames test categorisation'. Class labels
are 'nonmutagen' and 'mutagen'.

Starting to perform validation..
 - Predicted 100/126 molecules
Successfully predicted 126 molecules

==========================================================================================

Validation result for confidence level set to 0.7:
 - Accuracy: 0.976
 - Single label predictions: 0.992
 - Double label predictions: 0.008
 - Mean classification confidence: 0.963
 - Mean classification credibility: 0.765


Validation result for confidence level set to 0.8:
 - Accuracy: 0.984
 - Single label predictions: 0.905
 - Double label predictions: 0.095
 - Mean classification confidence: 0.963
 - Mean classification credibility: 0.765


Validation result for confidence level set to 0.9:
 - Accuracy: 0.992
 - Single label predictions: 0.849
 - Double label predictions: 0.151
 - Mean classification confidence: 0.963
 - Mean classification credibility: 0.765

In this case we validated the results given the same input file as the model was trained of, so the results are much better than expected, producing accuracies much higher than the ones asked for. In the case you are validating the results using a non-seen validation set the accuracies should be close to desired confidence levels.