This is a Git repository for sparse kernel optimal scoring. The R package sparseKOS is used for non-linear binary classification with simultaneous sparse feature selection. The corresponding reference is Lapanowski, Alexander F., and Gaynanova, Irina ''Sparse feature selection in kernel discriminant analysis via optimal scoring'', preprint.
library(devtools)
devtools::install_github("aflapan/sparseKOS")
There are two function in the package sparseKOS.
- The first is
SelectParams, which implements the automatic parameter selection methods used in sparse kernel optimal scoring. The user is allowed to specify the values of eitherSigmaor bothSigma, Gamma(see the section Hierarchical Parameters for more details). The function returns a list containing the parameter valuesSigma,Gamma, andLambdato be used. It has implementation
SelectParams( Data, Cat, Sigma = NULL, Gamma = NULL)
- The second function is
Predict. This function returns a list ofWeightson the data features and a discriminant vectorDvec. This list solves sparse kernel optimal scoring. If a valueXof unlabelled data is supplied, then the function uses bothWeightsandDvecto predict the class membership for every data point inX. The user is allowed to specify parameter valuesSigma,Gamma, andLambda. However, the user-specified parameters must satistfy their hierarchical ordering (see the section Hierarchical Parameters for more detail). If not all of the parameters are given user-specified values, the function first runsSelectParamsto obtain values for the remaining parameter values. It has implementation
Predict( X = NULL , Data, Cat, Sigma = NULL, Gamma = NULL, Lambda = NULL)
Sparse kernel optimal scoring has three parameters: a Gaussian kernel parameter Sigma, a ridge parameter Gamma, and a sparsity parameter Lambda. They have a hierarchical dependency, in that Sigma influences Gamma, and both influence Lambda. The ordering is
Top Sigma
Middle Gamma
Bottom Lambda
When using either of the functions, the user is only allowed to specify parameter combinations which adhere to the hierarchical ordering above. That is, they can only input parameters which go from Top to Bottom. For example, they could specify both Sigma and Gamma, but leave Lambda as the default NULL value. On the other hand, the user would not be allowed to specify only Lambda while leaving Sigma and Gamma as their default NULL values.
If the user supplies parameter values which violate the hierarchical ordering, the error message Hierarchical order of parameters violated. will be returned.
library(sparseKOS)
The examples here use a data set labelled Data. It comes with the sparseKOS package, and documentation is included in the package. It contains a list of training and test data TrainData, TestData along with corresponding categorical labels CatTrain and CatTest.
We first illustrate examples of the SelectParams function.
For an exmaple of generating all of the parameter values
SelectParams(Data = Data$TrainData,
Cat = Data$CatTrain)
The user can specify a value of Sigma or of both Sigma and Gamma
SelectParams(Data = Data$TrainData,
Cat = Data$CatTrain,
Sigma = 1.325386)
SelectParams(Data = Data$TrainData,
Cat = Data$CatTrain,
Sigma = 1.325386,
Gamma = 0.07531579)
However, the user CANNOT violate the hierarchical ordering. The example
SelectParams(Data = Data$TrainData,
Cat = Data$CatTrain,
Gamma = 0.07531579)
will return with an error message.
We now provide examples of the Predict function.
For an example with pre-specified parameter values:
Sigma <- 1.325386
Gamma <- 0.07531579
Lambda <- 0.002855275
Predict( X = Data$TestData,
Data = Data$TrainData,
Cat = Data$CatTrain,
Sigma = Sigma,
Gamma = Gamma,
Lambda = Lambda)
Predict( Data = Data$TrainData,
Cat = Data$CatTrain,
Sigma = Sigma,
Gamma = Gamma,
Lambda = Lambda)
Note that two different lists are returned in the above examples. The first example returns a list which includes the predicted class memberships for the unlabelled data in X = Data$TestData along with the final weights Weights and discriminant vector Dvec. The second example does not supply a value for the X variable and thus does not return predicted class memberships.
The user is also allowed to not specify certain parameters. In the following example, we leave Sigma, Gamma, and Lamda undefined. The function Predict automatically generates them.
Predict( X = Data$TestData,
Data = Data$TrainData,
Cat = Data$CatTrain)
Predict( Data = Data$TrainData,
Cat = Data$CatTrain)