首页> 外文会议>International Conference on Internet of Things, Big Data and Security >A First Approach on Big Data Missing Values Imputation
【24h】

A First Approach on Big Data Missing Values Imputation

机译:大数据缺失值估算的第一种方法

获取原文

摘要

Albeit most techniques and algorithms assume that the data is accurate, measurements in our analogic world are far from being perfect. Since our capabilities of storing and processing data are growing everyday, these imperfections will accumulate, generating poorer decisions and hindering any knowledge extraction process carried out over the raw data. One of the most disturbing imperfections is the presence of missing values. Many inductive algorithms assume that the data is complete, thus if they face missing data they will not work properly or the quality of the knowledge extracted will be poorer. At this point there is no sophisticated missing values treatment implemented in any major Big Data framework. In this contribution, we present two novel imputation methods based on clustering that achieve better results than simply removing the faulty examples or filling-in the missing values with the mean that can be easily ported to Spark's MLlib.
机译:尽管大多数技术和算法假设数据是准确的,我们的模拟世界中的测量远非完美。由于我们的存储和处理数据的能力每天都在增长,因此这些缺陷将积累,产生较差的决策并阻碍在原始数据上进行的任何知识提取过程。最令人不安的缺陷之一是存在缺失的值。许多归纳算法假设数据完成,因此如果它们面临丢失的数据,它们将无法正常工作或提取的知识的质量将是较差的。此时,任何主要的大数据框架都没有实施的复杂缺失值。在这一贡献中,我们提出了一种基于聚类的两种新的归纳方法,该方法可以实现更好的结果,而不是简单地删除故障示例或填充缺失值,其中均值可以容易地移植到Spark的Mllib。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号