Application Programming Interface

CPSign can, apart from the CLI-tool, be used as an API (Application Programming Interface) and be integrated in other projects and be called programmatically. The API consists of a factory-class that give access to a set of interfaces and classes open to use, provided that the user has a license for the product.

Programming examples

For the fastest way to get started using CPSign API, please refer to our BitBucket page with programming examples that get you up and running in no time!

Accessing the API

Using the API requires a license, just as the CLI does. To get access to the API, the factory class CPSignFactory must be instantiated. The constructors of this class handles the license-verification and permissions. The permissions are the same as for the CLI, a Predict license can only load existing models, whereas Standard and Pro licenses have full permissions to see and do everything. Trying to access methods that require full permission and without having them will typically result in an InvalidLicenseException. There are four constructors:

CPSignFactory factory = new CPSignFactory(license);
CPSignFactory factory = new CPSignFactory(license, PIN);
CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense);
CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense, PIN);

The four constructors are for two types of flavours, but each flavour has the optional PIN that can be supplied in case you have a YubiKey that has a unique PIN-code.

1-2. This is the standard constructor that can take any type of license and in case your license supports encryption the API will give access to encrypting models (see Encryption section).
3-4. These constructors are for passing a Pro-license as the first argument (which give access to training models or generating datasets). The second argument is the encryption-license, the license that datasets and models should be encrypted to. This means that the second argument must be either a Predict-license or another Pro-license. In case you have a unique PIN to your YubiKey that is connected to the encrypt-license, you call the constructor that allows the optional PIN.

Utility Methods

Apart from the TCP/ICP-specific functionality, the API also provides three utility methods that can be used for loading molecule files in either SDF or SMILES-format. Building and loading training data is handled by the CPSign library, but when making predictions, the input is required to be IAtomContainers (molecule-holding objects from the CDK-library, see https://github.com/cdk/cdk). To make this somewhat transparent for the API-user, the parseSMILES, parseSDF, parseSMILESfile and parseChemFile is provided for convenience. Calling them is as easy as:

IAtomContainer molFromSMILES = CPSignFactory.parseSMILES(SMILESString);
Iterator<IAtomContainer> molsFromSMILES = CPSignFactory.parseSMILESfile(dataStream);
Iterator<IAtomContainer> molsFromSDF = CPSignFactory.parseSDF(dataStream);
Iterator<IAtomContainer> molsIterator = CPSignFactory.parseChemFile(dataFile);

Configure logging

CPSign uses slf4j and logback internally which can be configured by any api user. The default Levels are set to Level.INFO for printing information to System.out and Level.ERROR for printing to System.err. If you for instance wish to totally disable any output from cpsign, simply set the appropriate levels:

// Get the root logger for cpsign
ch.qos.logback.classic.Logger cpsignRoot = (ch.qos.logback.classic.Logger) org.slf4j.LoggerFactory.getLogger("com.genettasoft");
// Disable all output
cpsignRoot.setLevel(ch.qos.logback.classic.Level.OFF);

Setting your own LibSVM and LibLinear parameters

By default CPSign uses parameters for LibLinear and LibSVM that has been found to produce good results together with signatures [6]. However, it is possible to to set all LibLinear and LibSVM parameters programmatically through the API. This is mostly important when using Sparse Prediction where data might come from other sources than derived from signatures. Setting parameters are as straightforward as:

// Create a ACPClassification, ACPRegression or TCPClassification object
ACPClassification acp = factory.createACPClassificationLibSVM();

ICPSVMModel icpImpl = (ICPSVMModel) acp.getICPImplementation();

// Either set parameters one by one
icpImpl.setC(newC);
icpImpl.setEpsilon(newEpsilon);

// Or create new parameters from scratch
svm_parameter svmParams = new svm_parameter();
svmParams.C = 100;
svmParams.eps = 0.5;
icpImpl.setSVMParameters(svmParams);

// For LibLinear:
ACPClassification acp = factory.createACPClassificationLibLinear();
ICPSVMModel icpImpl = (ICPSVMModel) acp.getICPImplementation();

// can set individual values as above, or start from scratch
Parameter liblinParams = new Parameter(SolverType.L1R_LR, 100, 0.5);
icpImpl.setLibLinearParameters(liblinParams);

Encrypting models and data in the API

Encrypting/decrypting models are done when saving and loading models. The methods saveModelEncrypted, writeSignaturesEncrypted, writeModelFilesEncrypted and writeDatasetEncrypted allows the user to pass a parameter of type EncryptionSpecification to the method. This parameter specifies how encryption/decryption should be done, and is specific for each license or license+YubiKey combination. So the encryption of your data is linked to your license. The EncryptionSpecification is acquired from the CPSignFactory class, from either of the getEncryptionSpec methods. Either use the no-parameter method to get the EncryptionSpecification from the license passed when instantiating the factory, or use the static method that will give you the EncryptionSpecification of an arbitrary license.

Decrypting data is done by passing the same EncryptionSpecification to the loading methods fromPrecomputed, loadSignatures, loadModelFiles or addModel.

Loading data from any datasource

Apart from loading molecules from SMILES-files and SDFiles we offer a more generic way of populating a Signatures problem by the method fromMolsIterator that allows the API-user to wrap their datasource in a Java-Iterator. In this way, CPSign will integrate to any type of datasource whether it’s a database or anything else. The API-user simply wraps their data in an Iterator<Pair<IAtomContainer, Double>> where in the regression-case the second value in the pair will be the regression-value and in the classification-case the second value have to be either 0.0 or 1.0 for the two classes (currently only supports binary classification). The molecules must come as a IAtomContainer as implemented in CDK (https://github.com/cdk/cdk). From v0.6.0 CPSign uses CDK version 2.0.