Table of Contents
There is one wrapper class for each Predictor type, instantiation is done either by
or the constructors of each class:
- CVAP :
- ACP Classification :
- ACP Regression :
- TCP Classification :
Once instantiated, the Signatures-wrapper object offers a set of utility methods for loading training data from a set of different
These methods are only accessible with the Standard or Pro licenses.
fromChemFile loads data from
SMILES, SDFiles and JSON files (see Input formats in CPSign). Note that you in this way can load data from
multiple files, simply by calling
fromMolsIterator once for each file/data source.
CPSign can in this way merge multiple datasources, from multiple formats.
// From a SDFile and JSON file, the endpoint to model must be supplied List<String> labels = Arrays.asList("0", "1"); String endpoint = "class"; predictor.fromChemFile(dataFile.toURI(), endpoint, labels); // From SMILES-file, no endpoint-name is needed (if modeling value is in second column) predictor.fromChemFile(dataFile.toURI(), null, labels);
CPSign version 0.6.0 introduced the possibility to use partitions of data exclusively for either training of models (proper training)
or for calibration. This is handled at the API level by introducing the
Dataset.java class that holds a single dataset
Problem.java class now holds three datasets; dataset, calibrationExclusive and modelingExclusive. These can be manipulated
directly if one would like to do so, or if the datasets are kept in separate files that is solved by calling the
fromMolsIterator with an extra argument that takes the enum
RecordType as such:
// Use "dataFile" for only modeling predictor.fromChemFile( dataFile.getURI(), endpoint, labels, RecordType.MODELING_EXCLUSIVE); // Use records in molsIterator for only calibration set predictor.fromMolsIterator(molsIterator, RecordType.CALIBRATION_EXCLUSIVE);
Both the precomputed data and the finished trained predictor can be of interest to save. The precomputed
data can be saved in case it is desired to train different predictors, possibly using different scoring
implementations or parameters. The trained predictor model can be used for later predictions and
be distributed to partners etc. Precomputed models can be saved through the
whereas the trained predictors can be saved both using the
ModelCreator class and calling the
method of the Signatures wrapper class.