CPSign can, apart from the CLI-tool, be used as an API (Application Programming Interface) and be integrated in other projects and be called programmatically. The API consists of a factory-class that give access to a set of interfaces and classes open to use, provided that the user has a license for the product.
Table of Contents
For the fastest way to get started using CPSign API, please refer to our BitBucket page with programming examples that get you up and running in no time!
Using the API requires a license, just as the CLI does. To get access to the API, the factory class
CPSignFactory must be instantiated.
The constructors of this class handles the license-verification and permissions. The permissions are the same as for the CLI, a Predict license
can only load existing models, whereas Standard and Pro licenses have full permissions to see and do everything. Trying to access methods that require
full permission and without having them will typically result in an
InvalidLicenseException. There are four constructors:
CPSignFactory factory = new CPSignFactory(license); CPSignFactory factory = new CPSignFactory(license, PIN); CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense); CPSignFactory factory = new CPSignFactory(ProLicense, encryptLicense, PIN);
The four constructors are for two types of flavors, but each flavor has the optional
PIN that can be supplied in case you have a YubiKey that has a unique PIN-code.
1-2. This is the standard constructor that can take any type of license and in
case your license supports encryption the API will give access to encrypting models (see Encryption section).
3-4. These constructors are for passing a Pro-license as the first argument (which give access to training models or generating datasets). The second argument is the encryption-license, the license that datasets and models should be encrypted to. This means that the second argument must be either a Predict-license or another Pro-license. In case you have a unique PIN to your YubiKey that is connected to the encrypt-license, you call the constructor that allows the optional PIN.
Apart from the TCP/ICP-specific functionality, the API also provides three utility methods
that can be used for loading molecule files in either SDF or SMILES-format. Building and
loading training data is handled by the CPSign library, but when making predictions, the
input is required to be IAtomContainers (molecule-holding objects from the CDK-library,
see https://github.com/cdk/cdk). To make this somewhat transparent for the API-user, the
is provided for convenience. Calling them is as easy as:
IAtomContainer molFromSMILES = CPSignFactory.parseSMILES(SMILESString); Iterator<IAtomContainer> molsFromSMILES = CPSignFactory.parseSMILESfile(dataStream); Iterator<IAtomContainer> molsFromSDF = CPSignFactory.parseSDF(dataStream); Iterator<IAtomContainer> molsIterator = CPSignFactory.parseChemFile(dataFile);
CPSign uses slf4j and logback internally which can be configured by any api user. The default Levels are set to Level.INFO for printing information to System.out and Level.ERROR for printing to System.err. If you for instance wish to totally disable any output from cpsign, simply set the appropriate levels:
// Get the root logger for cpsign ch.qos.logback.classic.Logger cpsignRoot = (ch.qos.logback.classic.Logger) org.slf4j.LoggerFactory.getLogger("com.genettasoft"); // Disable all output cpsignRoot.setLevel(ch.qos.logback.classic.Level.OFF);
Parameter tuning is supported using the
GridSearch class which does an exhaustive search of parameter values:
// Create a GridSearch object GridSearch gs = new GridSearch(cvFolds, confidence, tolerance); // or new GridSearch() with default values // Set your custom parameter regions gs.setC_END(6); // Either using the start, end, step parameters gs.setC_values(Arrays.asList(50., 100., 150.)); // Or use explicit values // Set a Writer to write all output to (otherwise only give you the optimal result) Writer writer = new OutputStreamWriter(System.out); gs.setWriter(writer); // GridSearch can either be called on the "Signatures level" SignaturesCPClassification signACP = ...; // Init and load data GridSearchResult res = gs.classification(signACP, OptimizationType.EFFICIENCY); // or at the "Problem level" ACPClassification acp = ...; // Init with ML algorithm and Sampling strategy Problem problem = ...; // Load data GridSearchResult res = gs.classification(problem, acp, OptimizationType.EFFICIENCY);
The default behaviour of the GridSearch is to only give the optimal values in the produced GridSearchResult, but all data can be outputted in case you give a Writer to the GridSearch (or alternatively configure the GridSearch logging-level to DEBUG).
Encrypting/decrypting models are done when saving and loading models. Encryption requires a license that supports it,
from the license an
EncryptionSpecification can be generated by CPSign, which then allows encryption using the API.
EncryptionSpecification objects are instantiated from the
CPSignFactory, either by sending the desired
encryption-enabled license to the constructor of
CPSignFactory or by calling one of the static methods
Once having the encryption specification the models can be encrypted by calling the saving methods
saveEncrypted(File, EncryptionSpecification) or directly one of the
Loading an encrypted model is done by sending the same
EncryptionSpecification (i.e. derived from the same license)
to the loading method of
Apart from loading molecules from SMILES-files and SDFiles CPSign offers a more generic way
of populating a Signatures problem by the method
fromMolsIterator that allows the
API-user to wrap their datasource in a Java-Iterator. In this way, CPSign will integrate
to any type of datasource; whether it’s a database or anything else.
The API-user simply wraps their data in an Iterator<Pair<IAtomContainer, Double>>
where in the regression-case the second value in the pair will be the regression-value
and in the classification-case the second value should be one of two values (one value for each class).
The molecules must come as a IAtomContainer as implemented in CDK (https://github.com/cdk/cdk).
From v0.6.0 CPSign uses CDK version 2.0.