首页> 外文会议>IEEE International Congress on Big Data >Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach
【24h】

Predictive Modeling in a Big Data Distributed Setting: A Scalable Bias Correction Approach

机译:大数据分布式设置中的预测建模:可扩展的偏压校正方法

获取原文

摘要

Massive datasets are becoming pervasive in computational sciences. Though this opens new perspectives for discovery and an increasing number of processing and storage solutions is available, it is still an open issue how to transpose machine learning and statistical procedures to distributed settings. Big datasets are no guarantee for optimal modeling since they do not automatically solve the issues of model design, validation and selection. At the same time conventional techniques of cross-validation and model assessment are computationally prohibitive when the size of the dataset explodes. This paper claims that the main benefit of a massive dataset is not related to the size of the training set but to the possibility of assessing in an accurate and scalable manner the properties of the learner itself (e.g. bias and variance). Accordingly, the paper proposes a scalable implementation of a bias correction strategy to improve the accuracy of learning techniques for regression in a big data setting. An analytical derivation and an experimental study show the potential of the approach.
机译:大规模数据集在计算科学方面正在成为普遍存在。虽然这开放了用于发现的新透视和越来越多的处理和存储解决方案,但它仍然是如何将机器学习和统计程序转换为分布式设置的开放问题。大数据集无法保证最佳建模,因为它们不会自动解决模型设计,验证和选择的问题。同时,当数据集的大小爆炸时,交叉验证和模型评估的常规技术在计算上禁止。本文声称,大规模数据集的主要好处与培训集的大小无关,而是以准确和可扩展的方式评估学习者本身的性质(例如偏差和方差)的可能性。因此,本文提出了一种偏压校正策略的可扩展实现,以提高大数据设置中回归的学习技术的准确性。分析衍生和实验研究表明了这种方法的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号