首页> 外文期刊>Applied Soft Computing >A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment
【24h】

A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment

机译:基于Bolasso的一致特征选择,使能随机林分类算法:信用风险评估的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Credit risk assessment has been a crucial issue as it forecasts whether an individual will default on loan or not. Classifying an applicant as good or bad debtor helps lender to make a wise decision. The modern data mining and machine learning techniques have been found to be very useful and accurate in credit risk predictive capability and correct decision making. Classification is one of the most widely used techniques in machine learning. To increase prediction accuracy of standalone classifiers while keeping overall cost to a minimum, feature selection techniques have been utilized, as feature selection removes redundant and irrelevant attributes from dataset. This paper initially introduces Bolasso (Bootstrap-Lasso) which selects consistent and relevant features from pool of features. The consistent feature selection is defined as robustness of selected features with respect to changes in dataset Bolasso generated shortlisted features are then applied to various classification algorithms like Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and K-Nearest Neighbors (K-NN) to test its predictive accuracy. It is observed that Bolasso enabled Random Forest algorithm (BS-RF) provides best results forcredit risk evaluation. The classifiers are built on training and test data partition (70:30) of three datasets (Lending Club's peer to peer dataset, Kaggle's Bank loan status dataset and German credit dataset obtained from UCI). The performance of Bolasso enabled various classification algorithms is then compared with that of other baseline feature selection methods like Chi Square, Gain Ratio, ReliefF and stand-alone classifiers (no feature selection method applied). The experimental results shows that Bolasso provides phenomenal stability of features when compared with stability of other algorithms. Jaccard Stability Measure (JSM) is used to assess stability of feature selection methods. Moreover BS-RF have good classification accuracy and is better than other methods in terms of AUC and Accuracy resulting in effectively improving the decision making process of lenders. (C) 2019 Elsevier B.V. All rights reserved.
机译:信用风险评估一直是一个至关重要的问题,因为它预测个人是否会违约贷款。将申请人分类为好或坏账者帮助贷方做出明智的决定。已发现现代数据挖掘和机器学习技术在信用风险预测能力和正确的决策中非常有用和准确。分类是机器学习中最广泛使用的技术之一。为了提高独立分类器的预测准确性,同时保持整体成本到最小的成本,已经利用了特征选择技术,因为特征选择从数据集中删除冗余和无关的属性。本文最初介绍了Bolasso(Bootstrap-Lasso),它从功能池中选择一致和相关的功能。一致的特征选择被定义为所选特征的鲁棒性相对于数据集Bolasso的变化,然后将生成的缺陷特征应用于各种分类算法,如随机森林(RF),支持向量机(SVM),幼稚贝叶斯(NB)和K-最近的邻居(k-nn)以测试其预测精度。观察到,Bolasso支持的随机森林算法(BS-RF)提供了截至信贷风险评估的最佳结果。分类器是基于三个数据集的培训和测试数据分区(70:30)(Lending Club的对等数据集,Kaggle的银行贷款状态数据集和从UCI获取的德语信用数据集)。然后将借助于各种分类算法的Bolasso的性能与Chi Square,增益比,Relieff和独立分类器(不应用的特征选择方法)等其他基线特征选择方法进行比较。实验结果表明,与其他算法的稳定性相比,Bolasso提供了特征的现象稳定性。 Jaccard稳定性测量(JSM)用于评估特征选择方法的稳定性。此外,BS-RF具有良好的分类精度,比AUC和准确性方面的其他方法更好,从而有效地改善贷方的决策过程。 (c)2019年Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号