Signatures Predictor

A Signatures Predictor are a wrapper put on top of a Predictor, adding useful functionality for handling predictions made for chemical compounds described with the signatures descriptor [2].

Instantiation

There is one wrapper class for each Predictor type, instantiation is done either by CPSignFactory or the constructors of each class:

  • CVAP : SignaturesVAPClassification class or CPSignFactory.createSignaturesVAPClassification
  • ACP Classification : SignaturesCPClassification class or CPSignFactory.createSignaturesCPClassification
  • ACP Regression : SignaturesCPRegression class or CPSignFactory.createSignaturesCPRegression
  • TCP Classification : SignaturesCPClassification class or CPSignFactory.createSignaturesCPClassification

Loading data & Predict

Once instantiated, the Signatures-wrapper object offers a set of utility methods for loading training data from a set of different file-types (fromChemFile and fromMolsIterator). These methods are only accessible with the Standard or Pro licenses. fromChemFile loads data from SMILES, SDFiles and JSON files (see Input formats in CPSign). Note that you in this way can load data from multiple files, simply by calling fromChemFile or fromMolsIterator once for each file/data source. CPSign can in this way merge multiple datasources, from multiple formats.

// From a SDFile and JSON file, the endpoint to model must be supplied
List<String> labels = Arrays.asList("0", "1");
String endpoint = "class";
predictor.fromChemFile(dataFile.toURI(), endpoint, labels);
// From SMILES-file, no endpoint-name is needed (if modeling value is in second column)
predictor.fromChemFile(dataFile.toURI(), null, labels);

CPSign version 0.6.0 introduced the possibility to use partitions of data exclusively for either training of models (proper training) or for calibration. This is handled at the API level by introducing the Dataset.java class that holds a single dataset and the Problem.java class now holds three datasets; dataset, calibrationExclusive and modelingExclusive. These can be manipulated directly if one would like to do so, or if the datasets are kept in separate files that is solved by calling the fromChemFile and fromMolsIterator with an extra argument that takes the enum RecordType as such:

// Use "dataFile" for only modeling
predictor.fromChemFile(
     dataFile.getURI(), endpoint, labels, RecordType.MODELING_EXCLUSIVE);

// Use records in molsIterator for only calibration set
predictor.fromMolsIterator(molsIterator, RecordType.CALIBRATION_EXCLUSIVE);

Saving and loading predictor models

Both the precomputed data and the finished trained predictor can be of interest to save. The precomputed data can be saved in case it is desired to train different predictors, possibly using different scoring implementations or parameters. The trained predictor model can be used for later predictions and be distributed to partners etc. Precomputed models can be saved through the BNDCreator class, whereas the trained predictors can be saved both using the BNDCreator class and calling the save() method of the Signatures wrapper class.

Image generation

To get visual results from the predictions (i.e. of the significant signature), please refer to the Image rendering page.