【24h】

To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

机译:重叠还是不重叠:针对按需数据上传优化增量MapReduce计算

获取原文
获取原文并翻译 | 示例

摘要

Research on cloud-based Big Data analytics has focused so far on optimizing the performance and cost-effectiveness of the computations, while largely neglecting an important aspect: users need to upload massive datasets on clouds for their computations. This paper studies the problem of running MapReduce applications when considering the simultaneous optimization of performance and cost of both the data upload and its corresponding computation taken together. We analyze the feasibility of incremental MapReduce approaches to advance the computation as much as possible during the data upload by using already transferred data to calculate intermediate results. Our key finding shows that overlapping the transfer time with as many incremental computations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity of the MapReduce cluster. Results show significant performance and cost reduction compared with state-of-the-art solutions that leverage incremental computations in a naive fashion.
机译:迄今为止,对基于云的大数据分析的研究一直专注于优化计算的性能和成本效益,而在很大程度上忽略了一个重要方面:用户需要将大量数据集上传到云中进行计算。当考虑同时优化数据上传和相应计算的性能和成本时,本文研究了运行MapReduce应用程序的问题。通过使用已经传输的数据来计算中间结果,我们分析了增量MapReduce方法在数据上传期间尽可能提高计算速度的可行性。我们的主要发现表明,传输时间与尽可能多的增量计算重叠并不总是有效的:更好的解决方案是等待足够的时间来填充MapReduce集群的计算能力。与以天真的方式利用增量计算的最新解决方案相比,结果显示出显着的性能和成本降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号