Authors of this version: Nagoya University, Department of Mechanical Systems Engineering, Machine Learning & Data Science Laboratory (Main author: Hiroyuki Hanada)
Original authors of LIBSVM: C. C. Chang and Chih-Jen Lin. (see the file README.LIBSVM.original)
Original authors of LIBLINEAR: R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. (see the file README.LIBLINEAR.original)
- Support vector machine (SVM) solvers LIBSVM and LIBLINEAR have two differences: one is the use of kernel functions (only LIBSVM accepts), the other is the regularization being imposed on the intercept term (only LIBLINEAR uses).
- This library is made by modifying LIBLINEAR (introducing a part of LIBSVM codes) to train SVM with kernel functions and intercept regularization.
- This implementation includes the source code of C++, and Python wrapper. (Other languages are not supported. Although "matlab" folder exists, it is not supported.)
- The strategy of caching kernel matrix is not well optimized. We would like to alter this in the future.
For the support vector machine (SVM) and similar machine learning models, LIBSVM and LIBLINEAR are well-known solvers. However, the author needs slightly different formulations from these two libraries.
The requirements of SVM for the author's work (e.g, Distributionally Robust Coreset Selection under Covariate Shift (published in TMLR)) are as follows:
- specification of weights on training instances (that weight the loss function values),
- use of kernel functions, and
- the intercept term is regularized (see the appendix at the end)
Here, LIBSVM is fine with points 1 and 2, but not 3. On the other hand, LIBLINEAR is fine with points 1 and 3, but not 2. So the author modified LIBLINEAR so that it accepts kernel functions. More specifically, we introduced a part of LIBSVM implementations into LIBLINEAR codes.
- Currently, only these models accept kernel functions. An error will be raised if a kernel function is specified for other models.
- L2 regularization + hinge loss (Command line option "-t 3"; ordinary SVM)
- L2 regularization + squared hinge loss (Command line option "-t 1")
- L2 regularization + logistic loss (Command line option "-t 7")
- Ordinary LIBSVM/LIBLINEAR implementations do not accept instance weights. The ones that accept instance weights are distributed here: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances .
The following software components are required:
- make
- gcc, g++
- python (If you need Python wrapper)
If you try Python sample codes, the following Python libraries are required:
- numpy
- scipy
- cvxpy
- ecos
- matplotlib
- scikit-learn
For example of using Anaconda and you would like to create a new environment to run this, the following command will do this:
conda create -n liblinearWK -c conda-forge pip numpy scipy cvxpy ecos matplotlib scikit-learn
conda activate liblinearWK
(Since this LIBLINEAR-Weights-Kernelized library is installed by pip, if you create an environment by conda, it is recommended to install pip by conda so that this library is installed only in this environment)
Run make command in the top folder of this library files. Please confirm that it did not end with an error.
If you do not use Python library, the preparation is completed with this. Executable files train and predict will be produced, so please see command line options by running them.
Move to example_commandline folder of this library files, and run the following command:
./linear_check.sh
Then the trained model parameters will be saved in the folder example_commandline/linear_check_sh_output.
We can compare the result above with the ones computed by CVXPY (optimizer for a large class of convex functions). In the folder example_commandline, run the following commands:
python linear_check.py
python linear_check_comparison.py
Then outputs like the followings will be shown. The "difference" means the L2-norm of trained model parameters between LIBLINEAR-Weights-Kernelized and CVXPY; Please confirm that the difference is sufficiently small.
linear_check_py_output/model-heart_scale-s7-t0-B1-W-1,1.1,2.log.primal
Size: (14,) (14,)
Difference: 3.98385354554017e-08
linear_check_py_output/model-heart_scale-s7-t0-B1.log.primal
Size: (14,) (14,)
Difference: 2.245077146169388e-08
Move to python folder, and run make command. Please confirm that it did not end with an error.
In the python folder, confirm that current Python environment is the one to install LIBLINEAR-Weights-Kernelized, and run pip install -e . command. (It is recommended to create a separate environment for pip and then install it.)
If the command did not end with an error, then confirm the installation by running the Python code from liblinear import liblinear, liblinearutil, commonutil. (It is easy to try in Python interactive mode.)
Move to example_python folder, and run the following command:
python comparison_CVXPY_LIBLINEAR.py ../splice_scale 0.1 2 1
This will train SVM for the data file "../splice_scale" with the regularization strength 0.1, weights for positive instances 2 and for negative instances 1. Then we compare it with the implementation by CVXPY the trained model parameters in the computation time and the model parameters.
Running this, it will plot the model parameters trained by this implementation and CVXPY, in the ascending order of the ones by this implementation. We will find that they are almost the same.
The computation times in the author's computer was as follows.
| Kernel | Optimizer for: | Training by CVXPY (s) | Training by This implementation (s) |
|---|---|---|---|
| Linear | Primal variables | 1.1540 | 0.1979 |
| Linear | Dual variables | 1.5910 | 1.4058 |
| RBF | Dual variables | 12.4578 | 0.0641 |
(Note that the optimizer for primal variables can be used only for linear kernel.)
(See also: Official document of LIBSVM and Official document of LIBLINEAR)
Suppose a linear prediction model for binary classifications, that is, given an outcome variable
$y\approx \mathrm{sign}(\boldsymbol{x}^\top\boldsymbol{\gamma} + b)$ .
Given a
$\mathrm{argmin}_{\boldsymbol{\gamma}\in\mathbb{R}^d, b\in\mathbb{R}} C\sum_{i=1}^n w_i \ell(y_i, X_{i:}\boldsymbol{\gamma} + b) + \rho(\boldsymbol{\gamma})$ ,
where
Here, we may use an alternative formulation. Let
Then the training is conducted as
$\mathrm{argmin}_{\boldsymbol{\gamma}\in\mathbb{R}^d, b\in\mathbb{R}} C\sum_{i=1}^n w_i \ell(y_i, \bar{X}_{i:}\boldsymbol{\beta}) + \bar{\rho}(\boldsymbol{\beta})$ .
Here,
From the first term, we can interpret that
that is, the last element of
The difference of the formulation from the previous one is that the penalty by the regularization function is imposed on whole