Combining B&B-based hybrid feature selection and the imbalance-oriented multiple-classifier ensemble for imbalanced credit risk assessment
Abstract
An ideal model for credit risk assessment is supposed to select important features and process imbalanced data sets in an effective manner. This paper proposes an integrated method that combines B&B (branch and bound)-based hybrid feature selection (BBHFS) with the imbalanceoriented multiple-classifier ensemble (IOMCE) for imbalanced credit risk assessment and uses the support vector machine (SVM) and the multiple discriminant analysis (MDA) as the base predictor. BBHFS is a hybrid feature selection method that integrates the t-test and B&B with the k-fold crossvalidation method to search for a satisfactory feature subset. The IOMCE divides majority samples into several subsets and then combines them with minority samples to construct several training sets for constructing a multiple-classifier ensemble model. We conduct main experiments using a 1:3 imbalanced corporate credit risk data set with continuous features and extended experiments using a 1:5 imbalanced data set with continuous features and a 1:3 imbalanced data set with discrete and nominal features. We combine no feature selection and five feature selection methods (the pure B&B, the factor analysis, the pure t-test, t-test & correlation analysis, and BBHFS) with single-classifier and the IOMCE to construct SVM and MDA models for an empirical comparison. When all features are continuous, the BBHFS-IOMCE method generally outperforms all the other methods. More specifically, BBHFS provides more stable and satisfactory results than the other feature selection methods, and compared with single-classifier models, IOMCE models can significantly enhance the recognition rate for minority samples while incurring a small reduction in the recognition rate for majority samples and maintaining an acceptable overall accuracy. When the features are almost discrete or nominal, the IOMCE method retains its ability to deal with an imbalanced data set, although the five feature selection methods have no significant advantages over no feature selection. This suggests that BBHFS is effective in retaining useful information when reducing the dimensionality of continuous features and that the BBHFS-IOMCE method is an important tool for imbalanced credit risk assessment.
Keyword : imbalanced credit risk assessment, imbalanced data set, hybrid feature selection, imbalance-oriented multiple-classifier ensemble
This work is licensed under a Creative Commons Attribution 4.0 International License.