首页> 外文会议>The 19th International Conference on Information Quality,Big Data: management amp; Data quality >An Investigation of How Data Quality is Affected by Dataset Size in the Context of Big Data Analytics
【24h】

An Investigation of How Data Quality is Affected by Dataset Size in the Context of Big Data Analytics

机译:大数据分析环境下数据集大小对数据质量影响的调查

获取原文
获取原文并翻译 | 示例

摘要

In the Big Data era the volume and availability of datasets are increasing massively throughout industrialrnorganisations. These organisations, with this data, are using data analytics to provide business insights inrna way that has never been exploited before. Despite its critical role in the past, however, the problems ofrndata quality are sometimes being dismissed in this Big Data world as being irrelevant. For example, in arnlarge sample of data, will the effects of any data errors be “scaled out” as we continue to add more data?rnThe aim of this work was to determine, empirically, if and when this is the case. We investigated thernproblem of completeness on data mining classification as we increase the volume of records used to trainrnthe classifier. Our results indicate that data quality is even more important in the Big Data world ofrnincreased volume. We also found that there are opportunities for managers to improve their analyticrnresults by combining, in the correction proportions, increasing dataset size with improvements to datarnquality.
机译:在大数据时代,整个工业组织中数据集的数量和可用性都在急剧增加。这些组织利用这些数据,正在使用数据分析以前所未有的方式提供业务洞察力。尽管它在过去扮演着至关重要的角色,但是在这个大数据世界中,数据质量问题有时被忽略了,因为这是无关紧要的。例如,在大量数据样本中,当我们继续添加更多数据时,是否会“缩小”任何数据错误的影响?这项工作的目的是凭经验确定是否是这种情况,以及何时是这种情况。随着我们增加了用于训练分类器的记录量,我们研究了数据挖掘分类的完整性问题。我们的结果表明,在数量不断增加的大数据世界中,数据质量甚至更为重要。我们还发现,经理们有机会通过在校正比例中结合增加数据集大小和改善数据质量来改善他们的分析结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号