首页> 美国政府科技报告 >Understanding Inefficiencies in Data-Intensive Computing.
【24h】

Understanding Inefficiencies in Data-Intensive Computing.

机译:了解数据密集型计算中的低效率。

获取原文

摘要

New programming frameworks for scale-out parallel analysis, such as MapReduce and Hadoop, have become a cornerstone for exploiting large datasets. However, there has been little analysis of how such systems perform relative to the capabilities of the hardware on which they run. This paper describes a simple model of I/O resource consumption that predicts the ideal lowerbound runtime of a parallel dataflow on a particular set of hardware. Comparing actual system performance to the model's ideal prediction exposes the inefficiency of a scale-out system. Using a simplified dataflow processing tool called Parallel DataSeries we show that the model's ideal can be approached (i.e., that it is not wildly optimistic), but that a gap of up to 20% remains for workloads using up to 45 nodes. Guided by the model, we analyze inefficiencies exposed in both the disk and networking subsystems--issues that will be faced by any DISC system built atop popular commodity hardware and OSs.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号