首页> 外文期刊>Expert Systems with Application >Multivariable data imputation for the analysis of incomplete credit data
【24h】

Multivariable data imputation for the analysis of incomplete credit data

机译:用于分析不完整信用数据的多变量数据估算

获取原文
获取原文并翻译 | 示例

摘要

Missing data significantly reduce the accuracy and usability of credit scoring models, especially in multi-variate missing cases. Most credit scoring models address this problem by deleting the missing instances from the dataset or imputing missing values with the mean, mode, or regression values. However, these methods often result in a significant loss of information or a bias. We proposed a novel method called BNII to impute missing values, which can be helpful for intelligent credit scoring systems. The proposed BNII algorithm consisted of two stages: the preparatory stage and the imputation stage. In the first stage, a Bayesian network with all of the attributes in the original dataset was constructed from the complete dataset so that both the network structure that implied the dependencies between variables and the parameters at each variable's conditional distributions could be learned. In the second stage, multivariables with missing values were iteratively imputed using Bayesian network models from the first stage. The algorithm was found to be monotonically convergent. The most significant advantages of the method include, it exploits the inherent probability-dependent relationship between variables, but without a specific probability distribution hypothesis, and it is suitable for multi-variate missing cases. Three datasets were used for experiments: one was the real dataset from a famous P2P financial company in China, and the other two were benchmark datasets provided by UCI. The experimental results showed that BNII performed significantly better than the other well-known imputation techniques. This suggested that the proposed method can be used to improve the performance of a credit scoring system and to be extended to other expert and intelligent systems. (C) 2019 Elsevier Ltd. All rights reserved.
机译:数据丢失会大大降低信用评分模型的准确性和可用性,尤其是在多变量缺失情况下。大多数信用评分模型通过从数据集中删除缺失的实例或用均值,众数或回归值来插补缺失值来解决此问题。但是,这些方法通常会导致大量信息丢失或偏见。我们提出了一种称为BNII的新方法来估算缺失值,这对智能信用评分系统很有帮助。所提出的BNII算法包括两个阶段:准备阶段和插补阶段。在第一阶段,从完整数据集中构造具有原始数据集中所有属性的贝叶斯网络,以便可以隐含变量之间的依赖关系的网络结构和每个变量的条件分布处的参数。在第二阶段,使用第一阶段的贝叶斯网络模型迭代推导具有缺失值的多变量。发现该算法是单调收敛的。该方法的最大优点包括,它利用变量之间固有的概率相关关系,但没有特定的概率分布假设,并且适用于多变量缺失情况。实验使用了三个数据集:一个是来自中国一家著名的P2P金融公司的真实数据集,另外两个是UCI提供的基准数据集。实验结果表明,BNII的性能明显优于其他众所周知的插补技术。这表明所提出的方法可以用来提高信用评分系统的性能,并可以扩展到其他专家和智能系统。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号