首页> 外文期刊>Journal of modelling in management >Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines
【24h】

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

机译:基于数据挖掘,人工神经网络和支持向量机的不平衡数据集信用风险评估

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - Credit scoring datasets are generally unbalanced. The number of repaid loans is higher than that of defaulted ones. Therefore, the classification of these data is biased toward the majority class, which practically means that it tends to attribute a mistaken "good borrower" status even to "very risky borrowers". In addition to the use of statistics and machine learning classifiers, this paper aims to explore the relevance and performance of sampling models combined with statistical prediction and artificial intelligence techniques to predict and quantify the default probability based on real-world credit data. Design/methodology/approach - A real database from a Tunisian commercial bank was used and unbalanced data issues were addressed by the random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). Performance was evaluated in terms of the confusion matrix and the receiver operating characteristic curve. Findings - The results indicated that the combination of intelligent and statistical techniques and resampling approaches are promising for the default rate management and provide accurate credit risk estimates. Originality/value - This paper empirically investigates the effectiveness of ROS and SMOTE in combination with logistic regression, artificial neural networks and support vector machines. The authors address the role of sampling strategies in the Tunisian credit market and its impact on credit risk. These sampling strategies may help financial institutions to reduce the erroneous classification costs in comparison with the unbalanced original data and may serve as a means for improving the bank's performance and competitiveness.
机译:目的-信用评分数据集通常是不平衡的。偿还的贷款数量高于拖欠的贷款数量。因此,这些数据的分类偏向多数类,这实际上意味着它甚至倾向于将错误的“良好借款人”状态归因于“非常有风险的借款人”。除了使用统计信息和机器学习分类器之外,本文还旨在探索结合了统计预测和人工智能技术的采样模型的相关性和性能,以基于现实信用数据预测和量化违约概率。设计/方法/方法-使用了来自突尼斯商业银行的真实数据库,并且通过随机过采样(ROS)和合成少数群体过采样技术(SMOTE)解决了不平衡的数据问题。根据混淆矩阵和接收器工作特性曲线评估了性能。调查结果-结果表明,智能和统计技术以及重采样方法的结合对于违约率管理很有希望,并且可以提供准确的信用风险估计。原创性/价值-本文结合逻辑回归,人工神经网络和支持向量机,对ROS和SMOTE的有效性进行了实证研究。作者阐述了抽样策略在突尼斯信贷市场中的作用及其对信贷风险的影响。与不平衡的原始数据相比,这些抽样策略可以帮助金融机构减少错误的分类成本,并且可以用作改善银行绩效和竞争力的一种手段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号