【24h】

A Big Data Placement Strategy in Geographically Distributed Datacenters

机译:地理分布式数据中心的大数据放置策略

获取原文

摘要

With the pervasiveness of the "Big Data" characteristic together with the expansion of geographically distributed datacenters in the Cloud computing context, processing large- scale data applications has become a crucial issue. Indeed, the task of finding the most efficient way of storing massive data across distributed locations is increasingly complex. Furthermore, the execution time of a given task that requires several datasets might be dominated by the cost of data migrations/exchanges, which depends on the initial placement of the input datasets over the set of datacenters in the Cloud and also on the dynamic data management strategy. In this paper, we propose a data placement strategy to improve the workflow execution time through the reduction of the cost associated to data movements between geographically distributed datacenters, considering their characteristics such as storage capacity and read/write speeds. We formalize the overall problem and then propose a data placement algorithm structured into two phases. First, we compute the estimated transfer time to move all involved datasets from their respective locations to the one where the corresponding tasks are executed. Second, we apply a greedy algorithm in order to assign each dataset to the optimal datacenter w.r.t the overall cost of data migrations. The heterogeneity of the datacenters together with their characteristics (storage and bandwidth) are both taken into account. Our experiments are conducted using Cloudsim simulator. The obtained results show that our proposed strategy produces an efficient placement and actually reduces the overheads of the data movement compared to both a random assignment and a selected placement algorithm from the literature.
机译:随着“大数据”的特性一起与云中地理分布的数据中心计算背景下,处理大规模数据应用的扩大普及已经成为一个至关重要的问题。事实上,发现在分布式存储地点海量数据的最快捷方式的任务越来越复杂。此外,需要几个数据集给定任务的执行时间可能通过数据迁移/交换的成本,这在集云数据中心的依赖于输入数据集的初始放置,并在数据的动态管理为主战略。在本文中,我们提出了一个数据放置策略,以提高通过关联到地理上分布的数据中心之间的数据移动的成本降低工作流程的执行时间,考虑到它们的特性,如存储容量和读取/写入速度。我们正式确定整体问题,然后提出结构分为两个阶段数据布局算法。首先,我们计算估计的传递时间从到相应的任务执行一个各自的位置移动所有涉及的数据集。其次,我们以每个数据集分配到w.r.t数据迁移的总体成本最优的数据中心应用贪心算法。在数据中心与它们的特性(存储和带宽)一起的异质性均考虑在内。我们的实验使用Cloudsim模拟器进行。所得到的结果表明,该策略产生一个有效放置和实际上降低相比,无论随机分配和从文献中所选择的布局算法的数据移动的开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号