A First Approach on Big Data Missing Values Imputation

机译：大数据缺失值估算的第一种方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Albeit most techniques and algorithms assume that the data is accurate, measurements in our analogic world are far from being perfect. Since our capabilities of storing and processing data are growing everyday, these imperfections will accumulate, generating poorer decisions and hindering any knowledge extraction process carried out over the raw data. One of the most disturbing imperfections is the presence of missing values. Many inductive algorithms assume that the data is complete, thus if they face missing data they will not work properly or the quality of the knowledge extracted will be poorer. At this point there is no sophisticated missing values treatment implemented in any major Big Data framework. In this contribution, we present two novel imputation methods based on clustering that achieve better results than simply removing the faulty examples or filling-in the missing values with the mean that can be easily ported to Spark's MLlib.

机译：尽管大多数技术和算法假设数据是准确的，我们的模拟世界中的测量远非完美。由于我们的存储和处理数据的能力每天都在增长，因此这些缺陷将积累，产生较差的决策并阻碍在原始数据上进行的任何知识提取过程。最令人不安的缺陷之一是存在缺失的值。许多归纳算法假设数据完成，因此如果它们面临丢失的数据，它们将无法正常工作或提取的知识的质量将是较差的。此时，任何主要的大数据框架都没有实施的复杂缺失值。在这一贡献中，我们提出了一种基于聚类的两种新的归纳方法，该方法可以实现更好的结果，而不是简单地删除故障示例或填充缺失值，其中均值可以容易地移植到Spark的Mllib。

著录项

来源
《International Conference on Internet of Things, Big Data and Security》|2019年|1(CD-ROM)|共9页
会议地点
作者
Besay Montesdeoca; Julian Luengo; Jesus Maillo; Diego Garcia-Gil; Salvador Garcia; Francisco Herrera;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Big data; Missing values; Imputation; K-Means; Fuzzy k-Means;

机译：大数据;缺少值;归档;K-means;模糊k-means;

相似文献

外文文献
中文文献
专利

1. A New Imputation Algorithm Based Approach for Missing Attribute Values in Databases: An Experimental Approach [J] . Madhu G International Journal of Artificial Intelligence and Knowledge Discovery . 2013,第4期

机译：一种基于归因算法的数据库缺失属性值的新方法：一种实验方法
2. A Bayesian vector autoregression-based data analytics approach to enable irregularly-spaced mixed-frequency traffic collision data imputation with missing values [J] . Li Zhenning, Yu Hao, Zhang Guohui, Transportation research . 2019,第Nova期

机译：基于贝叶斯矢量自回归的数据分析方法，可实现带有缺失值的不规则间隔混合频率交通碰撞数据插补
3. A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA [J] . Chiu-Hsieh Hsu, Qi Long, Yisheng Li, Journal of biopharmaceutical statistics . 2014,第3期

机译：缺失协变量值的数据的非参数多重插补方法及其在大肠腺瘤数据中的应用
4. A First Approach on Big Data Missing Values Imputation [C] . Besay Montesdeoca, Julian Luengo, Jesus Maillo, International Conference on Internet of Things, Big Data and Security . 2019

机译：大数据缺失值估算的第一种方法
5. Multiple Imputation Methods for Large Multi-Scale Data Sets with Missing or Suppressed Values [D] . Cao, Jian. 2018

机译：具有缺失或抑制值的大型多尺度数据集的多重估算方法
6. A NONPARAMETRIC MULTIPLE IMPUTATION APPROACH FOR DATA WITH MISSING COVARIATE VALUES WITH APPLICATION TO COLORECTAL ADENOMA DATA [O] . Chiu-Hsieh Hsu, Qi Long, Yisheng Li, -1

机译：缺失协变量值的数据的非参数多重插补方法及其在大肠结节数据中的应用
7. A Nonparametric Multiple Imputation Approach for Data with Missing Covariate Values with Application to Colorectal Adenoma Data [O] . Chiu-Hsieh Hsu, Qi Long, Yisheng Li, 2014

机译：具有缺失的Covariate值的数据的非参数多重归纳方法，其应用于结肠腺瘤数据

A First Approach on Big Data Missing Values Imputation

摘要

著录项

相似文献

相关主题

期刊订阅