首页> 外文会议>IEEE International Conference on Cloud Computing Technology and Science >DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers
【24h】

DRASH: A Data Replication-Aware Scheduler in Geo-Distributed Data Centers

机译:DRASH:地理分布式数据中心中的数据复制感知计划程序

获取原文

摘要

Driven by the trends of BigData and Cloud computing, there is a growing demand for processing and analyzing data that are generated and stored across geo-distributed data centers. However, due to the limited network bandwidth between data centers and the growing data volume spread across different locations, it has become increasingly inefficient to aggregate data and to perform computations at a single data center. An approach that has been commonly used by data-intensive cluster computation systems, like Hadoop, is to distribute computations based on data locality so that data can be processed locally to reduce the network overhead and improve performance. But limited work has been done to adapt and evaluate such technique for geo-distributed data centers. In this paper, we proposed DRASH (Data-Replication Aware Scheduler), a job scheduling algorithm that enforces data locality to prevent data transfer, and exploits data replications to improve overall system performance. Our evaluation using simulations with realistic workload traces shows that DRASH can outperform other existing approaches by 16% to 60% in average job completion time, and achieve greater improvements under higher data replication factors.
机译:在大数据和云计算的趋势推动下,对处理和分析跨地理分布的数据中心生成和存储的数据的需求不断增长。但是,由于数据中心之间的网络带宽有限以及分布在不同位置的数据量不断增长,聚合数据和在单个数据中心执行计算的效率越来越低。 Hadoop等数据密集型群集计算系统通常使用的一种方法是基于数据局部性分布计算,以便可以在本地处理数据以减少网络开销并提高性能。但是,针对地理分布的数据中心调整和评估这种技术的工作很少。在本文中,我们提出了DRASH(数据复制感知调度程序),它是一种作业调度算法,可强制执行数据局部性以防止数据传输,并利用数据复制来提高整体系统性能。我们使用具有实际工作量跟踪的模拟进行评估,结果表明,DRASH在平均作业完成时间方面可以比其他现有方法高出16%至60%,并且可以在更高的数据复制系数下实现更大的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号