CPSign can, apart from the CLI-tool, be used as an API (Application Programming Interface) and be integrated in other projects and be called programmatically. The API consists of a factory-class that give access to a set of interfaces and classes open to use, provided that the user has a license for the product.
Table of Contents
For the fastest way to get started using CPSign API, please refer to our BitBucket page with programming examples that get you up and running in no time!
Using the API requires a license, just as the CLI does. To get access to the API, the factory class
CPSignFactory must be instantiated.
The constructors of this class handles the license-verification and permissions. The permissions are the same as for the CLI, a Predict license
can only load existing models, whereas Standard and Pro licenses have full permissions to see and do everything. Trying to access methods that require
full permission and without having them will typically result in an InvalidLicenseException. There are four constructors:
CPSignFactory factory = new CPSignFactory(license); CPSignFactory factory = new CPSignFactory(license, PIN); CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense); CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense, PIN);
The four constructors are for two types of flavours, but each flavour has the optional
PIN that can be supplied in case you have a YubiKey that has a unique PIN-code.
1-2. This is the standard constructor that can take any type of license and in
case your license supports encryption the API will give access to encrypting models (see Encryption section).
3-4. These constructors are for passing a Pro-license as the first argument (which give access to training models or generating datasets). The second argument is the encryption-license, the license that datasets and models should be encrypted to. This means that the second argument must be either a Predict-license or another Pro-license. In case you have a unique PIN to your YubiKey that is connected to the encrypt-license, you call the constructor that allows the optional PIN.
Apart from the TCP/ICP-specific functionality, the API also provides three utility methods
that can be used for loading molecule files in either SDF or SMILES-format. Building and
loading training data is handled by the CPSign library, but when making predictions, the
input is required to be IAtomContainers (molecule-holding objects from the CDK-library,
see https://github.com/cdk/cdk). To make this somewhat transparent for the API-user, the
is provided for convenience. Calling them is as easy as:
IAtomContainer molFromSMILES = CPSignFactory.parseSMILES(SMILESString); Iterator<IAtomContainer> molsFromSMILES = CPSignFactory.parseSMILESfile(dataStream); Iterator<IAtomContainer> molsFromSDF = CPSignFactory.parseSDF(dataStream); Iterator<IAtomContainer> molsIterator = CPSignFactory.parseChemFile(dataFile);
CPSign uses slf4j and logback internally which can be configured by any api user. The default Levels are set to Level.INFO for printing information to System.out and Level.ERROR for printing to System.err. If you for instance wish to totally disable any output from cpsign, simply set the appropriate levels:
// Get the root logger for cpsign ch.qos.logback.classic.Logger cpsignRoot = (ch.qos.logback.classic.Logger) org.slf4j.LoggerFactory.getLogger("com.genettasoft"); // Disable all output cpsignRoot.setLevel(ch.qos.logback.classic.Level.OFF);
By default CPSign uses parameters for LibLinear and LibSVM that has been found to produce good results together with signatures . However, it is possible to to set all LibLinear and LibSVM parameters programmatically through the API. This is mostly important when using Sparse Prediction where data might come from other sources than derived from signatures. Setting parameters are as straightforward as:
// Create a ACPClassification, ACPRegression or TCPClassification object ACPClassification acp = factory.createACPClassificationLibSVM(); ICPSVMModel icpImpl = (ICPSVMModel) acp.getICPImplementation(); // Either set parameters one by one icpImpl.setC(newC); icpImpl.setEpsilon(newEpsilon); // Or create new parameters from scratch svm_parameter svmParams = new svm_parameter(); svmParams.C = 100; svmParams.eps = 0.5; icpImpl.setSVMParameters(svmParams); // For LibLinear: ACPClassification acp = factory.createACPClassificationLibLinear(); ICPSVMModel icpImpl = (ICPSVMModel) acp.getICPImplementation(); // can set individual values as above, or start from scratch Parameter liblinParams = new Parameter(SolverType.L1R_LR, 100, 0.5); icpImpl.setLibLinearParameters(liblinParams);
If you wish to do parameter tuning, we now support a grid search:
// Create a GridSearch object GridSearch gs = new GridSearch(cvFolds, nrModels, calibrationRatio, confidence, tolerance); // or new GridSearch() with default values // Set your custom parameter regions gs.setC_END(6); // Set a Writer to write all output to (otherwise only give you the optimal result) Writer writer = new OutputStreamWriter(System.out); gs.setWriter(writer); // Initialize a Signatures or Sparse object with the ACP/CCP implementation you wish to tune SignaturesCPClassification signACP = factory.createSignaturesCPClassification(acpImpl, 1, 3); // Load the dataset signACP.fromChemFile(chemFile, "Ames test categorisation", Arrays.asList("nonmutagen", "mutagen")); // Perform the Grid Search GridSearchResult res = gs.gridsearchClassification(signACP, OptimizationType.OPTIMIZE_EFFICIENCY);
The default behaviour of the GridSearch is to only give the optimal values in the produced GridSearchResult, but all data can be outputted in case you give a Writer to the GridSearch (or alternatively configure the GridSearch logging-level to DEBUG).
Encrypting/decrypting models are done when saving and loading models. The methods
writeDatasetEncrypted allows the user to pass a parameter of type
EncryptionSpecification to the method. This parameter specifies how
encryption/decryption should be done, and is specific for each license or license+YubiKey combination. So the encryption of your data is linked to your license.
EncryptionSpecification is acquired from the
CPSignFactory class, from either of the
getEncryptionSpec methods. Either use the no-parameter
method to get the
EncryptionSpecification from the license passed when instantiating the factory, or use the static method that will give you the
of an arbitrary license.
Decrypting data is done by passing the same
EncryptionSpecification to the loading methods
Apart from loading molecules from SMILES-files and SDFiles we offer a more generic way
of populating a Signatures problem by the method
fromMolsIterator that allows the
API-user to wrap their datasource in a Java-Iterator. In this way, CPSign will integrate
to any type of datasource whether it’s a database or anything else.
The API-user simply wraps their data in an Iterator<Pair<IAtomContainer, Double>>
where in the regression-case the second value in the pair will be the regression-value
and in the classification-case the second value have to be either 0.0 or 1.0 for the two
classes (currently only supports binary classification). The molecules must come as a
IAtomContainer as implemented in CDK (https://github.com/cdk/cdk). From v0.6.0 CPSign uses
CDK version 2.0.