Using CPSign from command line

CPSign can be used for number of different task within machine learning and compound chemistry. The standard flow of action are these steps:

  1. (Optional) Use Tune to find optional SVM settings for your dataset (otherwise good default ones are used according to [6]).
  2. (Optional) Use Precompute to convert chemical file format into a sparse numerical representation.
  3. Use Train to train the predictor (not required for TCP, but greatly speeds up the prediction time!).
  4. Use Predict to predict new compound(s).
  5. (Optional) Use Validate to validate the model, if you have an external test-set.

This page contains the general options. For each specific command, please follow the links above or use the navigation on the left.

Get usage help

Usage information can now be accessed both at the top level, which will give information about which commands that are accepted:

> java -jar cpsign-[version].jar -h
...
The following main commands are accepted:
   precompute
   train
   aggregate
   predict
   tune
   crossvalidate
   validate
   gensign

To see the parameters for each command, run java -jar cpsign-[version].jar [command] -h

Usage help is splitted up for each main command. Required parameters are marked with an astrisk (*) before the parameter. All parameters that take arguments has the required type of the argument specified after the parameter-flags. Example:

> java -jar cpsign-[version].jar precompute -h [other options]

   ...

OPTIONS
----------------------------------------------------------------------------------------------------
   Input options:
      -m, --modelfile  [URI] or [path]
         An existing model file to take signatures descriptors from (allowing the two models to be
         joined later on)
    * -t, --trainfile  [URI] or [path]
         Training file in SDF or SMILES format
      -rn, --response-name  [text]
         (SDFile) Name of response value to model, should match a property in the train file
         (SMILES file) Name of the column to model, should match header of that column
      -l, --labels  [label1 label2] or [label1, label2]
         Label(s) for response values in classification mode
      -c, --cptype  [integer]
         Type: 1) classification, 2) regression
         Default: 1

   ...

So the flag -m (--modelfile) takes either a [URI] (Unified Resource Identifier) or a [path] which is a standard file-path on your computer. Parameters that are not followed by "[type]" are flags that does not accept any further arguments to be passed.

Use a configuration file

The command line tool supports @syntax, meaning that all parameters can be specified in files so that you do not have to rewrite all parameters for each run. You can also use multiple files and mix file and plain arguments on the command line. Note that each parameter must come on a new line, even the flag and it's argument must be separated by a new line

Note The command name (i.e. train, predict etc.) always have to come as the first parameter, regardless if parameters are specified in a configuration file or as parameters passed directly.

train_config.txt:

train
--license
/path/to/cpsign0.3-standard.license
--cptype
4
--calibration-ratio
0.1
--nr-models
10
--time
...

The path specified to the @ symbol can either of the following:

  • An absolute path (e.g. /Users/username/Documents/runconfigs/config.txt on Unix/Linux systems or C:\Users\User\CPSign\runconfigs\config.txt on Windows systems)
  • A relative path to the current directory that the java -jar command is executed in (e.g. runconfigs/config.txt or ..\CPSign\runconfigs\config.txt)
  • On Unix/Linux based systems the path can also be relative to user home (i.e. using ~/Documents/runconfigs/config.txt)

Example in bash:

> # An absolute path
> java -jar cpsign-[version].jar \
        @/Users/username/runconfigs/train_config.txt \
        [other options]

> # A user home relative path
> java -jar cpsign-[version].jar \
        @~/runconfigs/train_config.txt \
        [other options]

> # A realtive path
> java -jar cpsign-[version].jar \
        @../runconfigs/train_config.txt \
        [other options]

Example in Windows Command prompt

:: Use absolute path (windows style)
C:\Users\User\CPSign> java -jar cpsign-0.5.1.jar @C:\Users\User\CPSign\runconfigs\train_config.txt

:: Use absolute path (*nix style)
C:\Users\User\CPSign> java -jar cpsign-0.5.1.jar @/Users/user/CPSign/runconfigs/train_config.txt

:: Use a relative path (forward slashes)
C:\Users\User\CPSign> java -jar cpsign-0.5.1.jar @../CPSign/runconfigs/train_config.txt

:: Use a relative path (back slashes)
C:\Users\User\CPSign> java -jar cpsign-0.5.1.jar @..\CPSign\runconfigs\train_config.txt

Get jar version

To get the unique jar-version, that is more specific than the major.minor.update version, run CPSign with flag --version

> java -jar cpsign-0.3.7.jar --version

CPSign - Conformal Prediction with the signatures molecular descriptor.
(C) Copyright 2015-2016, GenettaSoft AB, www.genettasoft.com
Version: 0.3.7.20160826_162153

Configure Logging & Output

CPSign is by default configured to write information on the screen and to a rather verbose log-file that will be written in the same directory of the jar file. However, you can configure CPSign to run in silent mode, meaning that no output will be written on the screen. If you desire to have a separate logfile for each run or if the user does not have write-access to the directory where the CPSign-jar is located, it's also possible to configure this.

Flag: --silent
Description: Disables output written to screen, will only be written to the cpsign.log file.

> java -jar cpsign-[version].jar [command] --silent [other options]

Flag: --logfile
Description: Supply the path of a per-run specific log-file. OBS the logfile with overwrite old files if they already exists. This option also removes the logging to the standard cpsign.log and only logs to the specified file.

> java -jar cpsign-[version].jar [command] --logfile /path/to/logfile.log [other options]

Using a license file

You need a valid license file in order to run CPSign.

> java -jar cpsign-[version].jar [command] \
   --license /path/to/Standard-license.license \
   [other options]

Selecting modeling type

For commands train, predict, tune and crossvalidate you need to supply which modeling type should be used. Commands gensign and precompute does not require this right now, even though procompute will need it once TCP regression is implemented.

Flag: -c, --cptype
Options:

  1. ACP classification
  2. ACP regression
  3. CCP classification
  4. CCP regression
  5. TCP classification
  6. TCP regression (not yet available)

Example:

> java -jar cpsign-[version].jar [command] -c 2 [other options]

Selecting modeling implementation

For commands train and crossvalidate you need to supply what type of underlying implementation should be used.

Flag: -i, --impl
Options: liblinear or libsvm (Default: liblinear)

Example (use libsvm instead of liblinear):

> java -jar cpsign-[version].jar [command] -i libsvm [other options]

Signature heights

Flag: -hs, --height-start
Description: Signatures start height (Default: 1)
Flag: -he, --height-end
Description: Signatures end height (Default: 3)

Example (set signatures heights ranging from 0 to 4):

> java -jar cpsign-[version].jar [command] --height-start 0 --height-end 4 [other options]

Response

Flag: -rn, --response-name
Description: Name of response value to model, should match a property in the train file (in case of SDF). If SMILES file has response value in a different column than the first column following the SMILES, give the header of the response column (see Input formats in CPSign).

Example:

> java -jar cpsign-[version].jar [command] -rn "Ames test categorisation" [other options]

Classification labels

Flag: -l, --labels
Description: Label(s) for response values in classification mode, blank-space or comma-separated. Should match entries in source data file (SDF or SMILES).

Example:

> java -jar cpsign-[version].jar [command] -l nonmutagen,mutagen [other options]

Note: In case you have negative numeric labels, e.g. -1, -2, the negative numbers must be handled a bit differently as they will otherwise be considered to be a new option-flag. Here's some examples of how to do this:

> java -jar cpsign-[version].jar [command] -l "\-1" "\-2" [other options]

> java -jar cpsign-[version].jar [command] -l [-1,-2] [other options] **Note no blank space between the spaces**

> java -jar cpsign-[version].jar [command] -l "[-1 -2]" [other options] **Note the quotation marks**

Encryption

When data should be encrypted, that is told to CPSign by passing the --encrypt flag. What type (one- or two-factor) is handled by the license itself, either you have a license configured for one-factor encryption or two-factor. Of course, if two-factor encryption should be used, the physical encryption key must be connected to your system. When accessing encrypted data, may it be precomputed data, signatures or models, the CPSign will handle that without any supplied flags, but requires the correct license to be supplied to the --license flag (the license that the data has been encrypted for and the physical encryption key if two-factor encryption is used).

Flag: --encrypt [path to license to encrypt data for]
Description: Specify that generated data should be encrypted after the given license. Note that a Standard or PRO license must be supplied to the --license flag for having access to this functionality.

Example encrypting for a Predict-license:

> java -jar cpsign-[version].jar [precompute or train] \
   --license /path/to/valid/pro/or/standard/license.license \
   --encrypt /path/to/valid/predict.license \
   [other options]

Now the generated signatures and data/models are encrypted after the predict.license file, accessing that data can only be done by supplying the same predict license next time. Note that the Standard or PRO license is only supplied for getting privileges for precomputing or training, but that license cannot decrypt the data:

> java -jar cpsign-[version].jar [predict] \
   --license /path/to/predict.license \
   [other options]

"Three-factor" Encryption

Two-factor encryption also comes with the option to have a non default PIN-code (i.e. "Three-factor" encryption). If you have a non-default PIN, this must be supplied both when generating encrypted data and when accessing it.

Flag: --two-factor-pin
Description: Pass the flag without any argument to it, the program will prompt you to enter the PIN when it starts

> java -jar cpsign-[version].jar [precompute or train] \
   --license /path/to/valid/pro/or/standard/license.license \
   --encrypt /path/to/valid/predict.license \
   --two-factor-pin \
   [other options]
   If two-factor encryption is used and key has a non-default PIN: [enter your pin here]

Generating OSGi bundles

CPSign will store all models as OSGi-enabled jar-files. The following output is plausible to add to the OSGi:

Output options:
* -mo, --model-out  [path]
    Model file to generate. Either give a fully specified file including a valid file suffix
    (.cpsign, .osgi, .jar) or a directory where the model should be generated (cpsign will create
    a unique file name for you)
* -mn, --model-name  [text]
    Model name for the OSGi plugin
  -mc, --model-category  [text]
    The category of the model, will end up as model-endpoint in the OSGi
  -mv, --model-version  [text]
    Optional model version in SemVer versioning format
    Default: 1.0.0_2017-10-09_15:15:57.861

-mo, --model-out is a required parameter that takes a path to were the model should be stored. This can either be a fully qualified path to a specific file (that cannot already exist), or it can be a directory where the model should be stored. If the path only specify a directory, CPSign will generate a file name from your specified arguments for --model-name and --model-version. If that file already exists, it will add a index to the end of the file-name to make sure that old files are not overwritten.

Exit status codes

From v 0.6.1 cpsign will exit with non-zero exit codes in case of failure during execution. The exit codes are the following:

0:Successful execution
1:Faulty command
2:Faulty arguments
4:Missing permissions
8:Out of memory
16:Program error