Large-Scale Fuzzy Least Squares Twin SVMs for Class Imbalance Learning
Authors: M. A. Ganaie, M. Tanveer and C. -T. Lin
Publication: IEEE Transactions on Fuzzy Systems (TFS)
Issue: Volume 30, Issue 11 – November 2022
Abstract: Twin support vector machines (TSVMs) have been successfully employed for binary classification problems. With the advent of machine learning algorithms, data have proliferated and there is a need to handle or process large-scale data. TSVMs are not successful in handling large-scale data due to the following: 1) the optimization problem solved in the TSVM needs to calculate large matrix inverses, which makes it an ineffective choice for large-scale problems; 2) the empirical risk minimization principle is employed in the TSVM and, hence, may suffer due to overfitting; and 3) the Wolfe dual of TSVM formulation involves positive-semidefinite matrices, and hence, singularity issues need to be resolved manually. Keeping in view the aforementioned shortcomings, in this article, we propose a novel large-scale fuzzy least squares TSVM for class imbalance learning (LS-FLSTSVM-CIL). We formulate the LS-FLSTSVM-CIL such that the proposed optimization problem ensures that: 1) no matrix inversion is involved in the proposed LS-FLSTSVM-CIL formulation, which makes it an efficient choice for large-scale problems; 2) the structural risk minimization principle is implemented, which avoids the issues of overfitting and results in better performance; and 3) the Wolfe dual formulation of the proposed LS-FLSTSVM-CIL model involves positive-definite matrices. In addition, to resolve the issues of class imbalance, we assign fuzzy weights in the proposed LS-FLSTSVM-CIL to avoid bias in dominating the samples of class imbalance problems. To make it more feasible for large-scale problems, we use an iterative procedure known as the sequential minimization principle to solve the objective function of the proposed LS-FLSTSVM-CIL model. From the experimental results, one can see that the proposed LS-FLSTSVM-CIL demonstrates superior performance in comparison to baseline classifiers. To demonstrate the feasibility of the proposed LS-FLSTSVM-CIL on large-scale classification problems, we evaluate the classification models on the large-scale normally distributed clustered (NDC) dataset. To demonstrate the practical applications of the proposed LS-FLSTSVM-CIL model, we evaluate it for the diagnosis of Alzheimer’s disease and breast cancer disease. Evaluation on NDC datasets shows that the proposed LS-FLSTSVM-CIL has feasibility in large-scale problems as it is fast in comparison to the baseline classifiers.
Index Terms: Class imbalance, machine learning, maximum margin, support vector machines (SVMs), twin support vector machine (TSVM).
IEEE Xplore Link: https://ieeexplore.ieee.org/document/9740453