首页> 外文会议>IEEE International Congress on Big Data >Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach
【24h】

Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach

机译:大数据分布式环境中的预测建模:可扩展的偏差校正方法

获取原文

摘要

Massive datasets are becoming pervasive in computational sciences. Though this opens new perspectives for discovery and an increasing number of processing and storage solutions is available, it is still an open issue how to transpose machine learning and statistical procedures to distributed settings. Big datasets are no guarantee for optimal modeling since they do not automatically solve the issues of model design, validation and selection. At the same time conventional techniques of cross-validation and model assessment are computationally prohibitive when the size of the dataset explodes. This paper claims that the main benefit of a massive dataset is not related to the size of the training set but to the possibility of assessing in an accurate and scalable manner the properties of the learner itself (e.g. bias and variance). Accordingly, the paper proposes a scalable implementation of a bias correction strategy to improve the accuracy of learning techniques for regression in a big data setting. An analytical derivation and an experimental study show the potential of the approach.
机译:海量数据集在计算科学中正变得越来越普遍。尽管这为发现打开了新的视野,并且提供了越来越多的处理和存储解决方案,但是如何将机器学习和统计过程转换为分布式设置仍然是一个悬而未决的问题。大数据集并不能自动解决模型设计,验证和选择的问题,因此不能保证最佳建模。同时,当数据集的大小激增时,交叉验证和模型评估的常规技术在计算上是令人望而却步的。本文认为,海量数据集的主要好处与训练集的大小无关,而是与以准确且可扩展的方式评估学习者本身的属性(例如偏见和方差)的可能性有关。因此,本文提出了一种偏差校正策略的可扩展实施方案,以提高在大数据环境中进行回归的学习技术的准确性。分析推导和实验研究表明了该方法的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号