Multivariable data imputation for the analysis of incomplete credit data

Lan Qiujun; Xu Xuqing; Ma Haojie; Li Gang

首页> 外文期刊>Expert Systems with Application >Multivariable data imputation for the analysis of incomplete credit data

【24h】

Multivariable data imputation for the analysis of incomplete credit data

机译：用于分析不完整信用数据的多变量数据估算

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Missing data significantly reduce the accuracy and usability of credit scoring models, especially in multi-variate missing cases. Most credit scoring models address this problem by deleting the missing instances from the dataset or imputing missing values with the mean, mode, or regression values. However, these methods often result in a significant loss of information or a bias. We proposed a novel method called BNII to impute missing values, which can be helpful for intelligent credit scoring systems. The proposed BNII algorithm consisted of two stages: the preparatory stage and the imputation stage. In the first stage, a Bayesian network with all of the attributes in the original dataset was constructed from the complete dataset so that both the network structure that implied the dependencies between variables and the parameters at each variable's conditional distributions could be learned. In the second stage, multivariables with missing values were iteratively imputed using Bayesian network models from the first stage. The algorithm was found to be monotonically convergent. The most significant advantages of the method include, it exploits the inherent probability-dependent relationship between variables, but without a specific probability distribution hypothesis, and it is suitable for multi-variate missing cases. Three datasets were used for experiments: one was the real dataset from a famous P2P financial company in China, and the other two were benchmark datasets provided by UCI. The experimental results showed that BNII performed significantly better than the other well-known imputation techniques. This suggested that the proposed method can be used to improve the performance of a credit scoring system and to be extended to other expert and intelligent systems. (C) 2019 Elsevier Ltd. All rights reserved.

机译：数据丢失会大大降低信用评分模型的准确性和可用性，尤其是在多变量缺失情况下。大多数信用评分模型通过从数据集中删除缺失的实例或用均值，众数或回归值来插补缺失值来解决此问题。但是，这些方法通常会导致大量信息丢失或偏见。我们提出了一种称为BNII的新方法来估算缺失值，这对智能信用评分系统很有帮助。所提出的BNII算法包括两个阶段：准备阶段和插补阶段。在第一阶段，从完整数据集中构造具有原始数据集中所有属性的贝叶斯网络，以便可以隐含变量之间的依赖关系的网络结构和每个变量的条件分布处的参数。在第二阶段，使用第一阶段的贝叶斯网络模型迭代推导具有缺失值的多变量。发现该算法是单调收敛的。该方法的最大优点包括，它利用变量之间固有的概率相关关系，但没有特定的概率分布假设，并且适用于多变量缺失情况。实验使用了三个数据集：一个是来自中国一家著名的P2P金融公司的真实数据集，另外两个是UCI提供的基准数据集。实验结果表明，BNII的性能明显优于其他众所周知的插补技术。这表明所提出的方法可以用来提高信用评分系统的性能，并可以扩展到其他专家和智能系统。（C）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Expert Systems with Application》 |2020年第3期|112926.1-112926.12|共12页
作者
Lan Qiujun; Xu Xuqing; Ma Haojie; Li Gang;
展开▼
作者单位

Hunan Univ Business Sch Changsha 410082 Hunan Peoples R China;

Deakin Univ Sch Informat Technol Geelong Vic 3216 Australia|Chinese Acad Sci Xinjiang Tech Inst Phys & Chem Urumqi 830011 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayesian network; Credit scoring; Data missing; Data mining;

机译：贝叶斯网络信用评分;数据丢失;数据挖掘;

相似文献

外文文献
中文文献
专利

1. Multiple imputation for analysis of incomplete data in distributed health data networks [J] . Changgee Chang, Yi Deng, Xiaoqian Jiang, Nature Communications . 2020,第1期

机译：分布式健康数据网络中的不完整数据分析多重估算
2. Pre-processing of incomplete spectrum sensing data in spectrum sensing data falsification attacks detection: a missing data imputation approach [J] . Junnan Yao, Jianjun Cao, Qibin Zheng, Communications, IET . 2016,第11期

机译：频谱感测数据伪造攻击检测中不完整的频谱感测数据的预处理：缺失的数据插补方法
3. JointAI: Joint Analysis and Imputation of Incomplete Data in R [J] . Nicole S.Erler, Dimitris Rizopoulos, Emmanuel M.E.H.Lesaffre Journal of Statistical Software . 2021,第20期

机译：Chinanai：R中不完全数据的联合分析及归纳
4. An improved parallel matrix factorization method for data imputation of multivariable time series data with high level noises [C] . Wang Yuqi, Lv Zheng, Gao Lei, Chinese Automation Congress . 2020

机译：具有高级别噪声的多变量时间序列数据数据载体的改进并联矩阵分解方法
5. Handling Incomplete High-Dimensional Multivariate Longitudinal Data with Mixed Data Types by Multiple Imputation Using a Longitudinal Factor Analysis Model. [D] . Lu, Xiang. 2016

机译：使用纵向因素分析模型通过多重插补处理具有混合数据类型的不完整的高维多元纵向数据。
6. Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment [O] . Yong-Bi Fu 2014

机译：带有估算的高度不完全SNP基因型数据的遗传多样性分析：一项实证评估
7. On using multiple imputation for exploratory factor analysis of incomplete data [O] . Vahid Nassiri, Anikó Lovik, Geert Molenberghs, 2018

机译：使用多重估算对不完整数据的探索性因子分析

Multivariable data imputation for the analysis of incomplete credit data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅