A cost-sensitive logistic regression credit scoring model based on multi-objective optimization approach

Feng Shen; Run Wang; Yu Shen

doi:10.3846/tede.2019.11337

DOI: https://doi.org/10.3846/tede.2019.11337

Abstract

Credit scoring is an important process for peer-to-peer (P2P) lending companies as it determines whether loan applicants are likely to default. The aim of most credit scoring models is to minimize the classification error rate, which implies that all classification errors bear the same cost; however, in reality, there is a significant cost-sensitive problem in credit scoring methods. Therefore, in this paper, a new cost-sensitive logistic regression credit scoring model based on a multi-objective optimization approach is proposed that has two objectives in the cost-sensitive logistic regression process. The cost-sensitive logistic regression parameters are solved using a multiple objective particle swarm optimization (MOPSO) algorithm. In the empirical analysis, the proposed model was applied to the credit scoring of a Chinese famous P2P company, from which it was found that compared with other common credit scoring models, the proposed model was able to effectively reduce type II error rates and total classification error costs, and improve the AUC, the F1 values (reconciliation average of Recall and Precision), and the G-means. The proposed model was compared with other multi-objective optimization algorithms to further demonstrate that MOPSO is the best approach for cost-sensitive logistic regression credit scoring models.

First published online 27 November 2019

Keyword : credit scoring, cost-sensitive, logistic regression, multi-objective optimization, P2P

How to Cite

Shen, F., Wang, R., & Shen, Y. (2020). A cost-sensitive logistic regression credit scoring model based on multi-objective optimization approach. Technological and Economic Development of Economy, 26(2), 405-429. https://doi.org/10.3846/tede.2019.11337

Published in Issue

Feb 3, 2020

Abstract Views

3528

PDF Downloads

2241

This work is licensed under a Creative Commons Attribution 4.0 International License.

References

Abdou, H. A., Tsafack, M. D. D., Ntim, C. G., & Baker, R. D. (2016). Predicting creditworthiness in retail banking with limited scoring data. Knowledge-Based Systems, 103, 89-103. https://doi.org/10.1016/j.knosys.2016.03.023

Abraham, T. W. (2018). Estimating the effects of financial access on poor farmers in rural northern Nigeria. Financial Innovation, 4(25), 1-20. https://doi.org/10.1186/s40854-018-0112-2

Ala’raj, M., & Abbod, M. F. (2016). Classifiers consensus system approach for credit scoring. KnowledgeBased Systems, 104, 89-105. https://doi.org/10.1016/j.knosys.2016.04.013

Altman, E. I., & Sabato, G. (2005). Effects of the new Basel capital accord on bank capital requirements for SMEs. Journal of Financial Services Research, 28(1-3), 15-42. https://doi.org/10.1007/s10693-005-4355-5

Bahnsen, A. C., Aouada, D., & Ottersten, B. (2015). Example-dependent cost-sensitive decision trees. Expert Systems with Applications, 42(19), 6609-6619. https://doi.org/10.1016/j.eswa.2015.04.042

Bahnsen, A. C., Aouada, D., & Ottersten, B. (2014, December). Example-dependent cost-sensitive logistic regression for credit scoring. In 2014 13th International conference on machine learning and applications (pp. 263-269). Detroit, MI, USA: IEEE. https://doi.org/10.1109/ICMLA.2014.48

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., & Nielsen, H. (2000). Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5), 412-424. https://doi.org/10.1093/bioinformatics/16.5.412

Bequé, A., Coussement, K., Gayler, R., & Lessmann, S. (2017). Approaches for credit scorecard calibration: An empirical analysis. Knowledge-Based Systems, 134, 213-227. https://doi.org/10.1016/j.knosys.2017.07.034

Chen, Z., Li, Y., Wu, Y., & Luo, J. (2017). The transition from traditional banking to mobile internet finance: an organizational innovation perspective-a comparative study of Citibank and ICBC. Financial Innovation, 3(12), 1-16. https://doi.org/10.1186/s40854-017-0062-0

Coello, C. A. C., Pulido, G. T., & Lechuga, M. S. (2004). Handling multiple objectives with particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(3), 256-279. https://doi.org/10.1109/TEVC.2004.826067

Coello, C. A. C. C., & Pulido, G. T. (2001, March). A micro-genetic algorithm for multiobjective optimization. In International conference on evolutionary multi-criterion optimization (pp. 126-140). Berlin, Heidelberg: Springer. https://doi.org/10.1007/3-540-44719-9_9

Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000, September). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In International conference on parallel problem solving from nature (pp. 849-858). Berlin, Heidelberg: Springer. https://doi.org/10.1007/3-540-45356-3_83

Desai, V. S., Crook, J. N., & Overstreet Jr, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24-37. https://doi.org/10.1016/0377-2217(95)00246-4

Ding, S., Chen, C., Xin, B., & Pardalos, P. M. (2018). A bi-objective load balancing model in a distributed simulation system using NSGA-II and MOPSO approaches. Applied Soft Computing, 63, 249-267. https://doi.org/10.1016/j.asoc.2017.09.012

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010

García, V., Marqués, A. I., & Sánchez, J. S. (2012, November). Improving risk predictions by preprocessing imbalanced credit data. In International conference on neural information processing (pp. 68-75). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-34481-7_9

Greene, W. (1998). Sample selection in credit-scoring models. Japan and the World Economy, 10(3), 299-316. https://doi.org/10.1016/S0922-1425(98)00030-9

Günnemann, N., & Pfeffer, J. (2017, May). Cost matters: a new example-dependent cost-sensitive logistic regression model. In Pacific-Asia conference on knowledge discovery and data mining (pp. 210-222). Cham: Springer. https://doi.org/10.1007/978-3-319-57454-7_17

Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426. https://doi.org/10.1016/j.ejor.2015.05.050

Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., & Baesens, B. (2011). An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decision Support Systems, 51(1), 141-154. https://doi.org/10.1016/j.dss.2010.12.003

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: Springer. https://doi.org/10.1007/978-1-4614-7138-7

Kao, L. J., Chiu, C. C., & Chiu, F. Y. (2012). A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring. Knowledge-Based Systems, 36, 245-252. https://doi.org/10.1016/j.knosys.2012.07.004

Khashman, A. (2010). Neural networks for credit risk evaluation: Investigation of different neural models and learning schemes. Expert Systems with Applications, 37(9), 6233-6239. https://doi.org/10.1016/j.eswa.2010.02.101

Kim, J., Choi, K., Kim, G., & Suh, Y. (2012). Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost. Expert Systems with Applications, 39(4), 4013-4019. https://doi.org/10.1016/j.eswa.2011.09.071

Knowles, J. D., & Corne, D. W. (2000). Approximating the nondominated front using the Pareto archived evolution strategy. Evolutionary Computation, 8(2), 149-172. https://doi.org/10.1162/106365600568167

Kou, G., Chao, X., Peng, Y., Alsaadi, F. E., & Herrera-Viedma, E. (2019). Machine learning methods for systemic risk analysis in financial sectors. Technological and Economic Development of Economy, 25(5), 716-742. https://doi.org/10.3846/tede.2019.8740

Ling, C. X., & Sheng, V. S. (2011). Cost-sensitive learning. In Encyclopedia of machine learning (pp. 231-235). Boston, MA: Springer. https://doi.org/10.1007/978-0-387-30164-8_181

Marqués, A. I., García, V., & Sánchez, J. S. (2013). On the suitability of resampling techniques for the class imbalance problem in credit scoring. Journal of the Operational Research Society, 64(7), 1060-1070. https://doi.org/10.1057/jors.2012.120

Min, J. H., & Lee, Y. C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications, 28(4), 603-614. https://doi.org/10.1016/j.eswa.2004.12.008

Nayak, S. C., & Misra, B. B. (2018). Estimating stock closing indices using a GA-weighted condensed polynomial neural network. Financial Innovation, 4(21), 1-22. https://doi.org/10.1186/s40854-018-0104-2

Rashid, A., & Jabeen, N. (2018). Financial frictions and the cash flow–external financing sensitivity: evidence from a panel of Pakistani firms. Financial Innovation, 4(15), 1-20. https://doi.org/10.1186/s40854-018-0100-6

Reyes-Sierra, M., & Coello, C. C. (2006). Multi-objective particle swarm optimizers: A survey of the state-of-the-art. International Journal of Computational Intelligence Research, 2(3), 287-308. https://doi.org/10.5019/j.ijcir.2006.68

Serrano-Cinca, C., & Gutiérrez-Nieto, B. (2016). The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decision Support Systems, 89, 113-122. https://doi.org/10.1016/j.dss.2016.06.014

Shen, F., Zhao, X., Li, Z., Li, K., & Meng, Z. (2019). A novel ensemble classification model based on neural networks and a classifier optimisation technique for imbalanced credit risk evaluation. Physica A: Statistical Mechanics and its Applications, 526, 121073. https://doi.org/10.1016/j.physa.2019.121073

Thomas, L. C. (2010). Consumer credit models pricing, profit and portfolios. Journal of the Royal Statistical Society, 173(2), 468-468. https://doi.org/10.1111/j.1467-985X.2009.00634_12.x

Tsai, C. F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2), 120-127. https://doi.org/10.1016/j.knosys.2008.08.002

Verbraken, T., Bravo, C., Weber, R., & Baesens, B. (2014). Development and application of consumer credit scoring models using profit-based classification measures. European Journal of Operational Research, 238(2), 505-513. https://doi.org/10.1016/j.ejor.2014.04.001

Wang, G., Ma, J., Huang, L., & Xu, K. (2012). Two credit scoring models based on dual strategy ensemble trees. Knowledge-Based Systems, 26, 61-68. https://doi.org/10.1016/j.knosys.2011.06.020

Wiginton, J. C. (1980). A note on the comparison of logit and discriminant models of consumer credit behavior. Journal of Financial and Quantitative Analysis, 15(3), 757-770. https://doi.org/10.2307/2330408

Xia, Y., Liu, C., & Liu, N. (2017). Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electronic Commerce Research and Applications, 24, 30-49. https://doi.org/10.1016/j.elerap.2017.06.004

Yu, L., Yue, W., Wang, S., & Lai, K. K. (2010). Support vector machine based multiagent ensemble learning for credit risk evaluation. Expert Systems with Applications, 37(2), 1351-1360. https://doi.org/10.1016/j.eswa.2009.06.083

Zhu, X., Li, J., Wu, D., Wang, H., & Liang, C. (2013). Balancing accuracy, complexity and interpretability in consumer credit decision making: A C-TOPSIS classification approach. Knowledge-Based Systems, 52(6), 258-267. https://doi.org/10.1016/j.knosys.2013.08.004