首页> 外文期刊>The Computer journal >Data Locality-Aware Big Data Query Evaluation in Distributed Clouds
【24h】

Data Locality-Aware Big Data Query Evaluation in Distributed Clouds

机译:分布式云中数据本地感知的大数据查询评估

获取原文
获取原文并翻译 | 示例
           

摘要

With more and more businesses and organizations outsourcing their IT services to distributed clouds for cost savings, historical and operational data generated by the services have been growing exponentially. The generated data that are referred to as big data, stored at different geographic datacenters, now become an invaluable asset to these businesses and organizations, as they can make use of the data through analysis to identify business advantages and make strategic decisions. Big data analytics thus has been emerged as a main research topic in cloud computing. To efficiently evaluate a big data analytic query in a distributed cloud consisting of multiple datacenters at different geographic locations interconnected by the Internet, it poses great challenges: (i) the source data of the query typically are located at different datacenters; and (ii) the resource demands of the query may be beyond the supplies of any single datacenter at that moment. In this paper, we formulate an online query evaluation problem for big data analytic queries in distributed clouds, with an objective to maximize the query acceptance ratio while minimizing the accumulative query evaluation cost, for which we first propose a novel metric to model the usages of different resources in the distributed cloud, by incorporating the capacities and workloads of different datacenters and links, as well as resource demands of different queries. We then devise efficient online algorithms for query evaluations under both unsplittable and splittable source data assumptions. We finally conduct extensive experiments by simulations to evaluate the performance of the proposed algorithms. Experimental results demonstrate that the proposed algorithms are promising, and outperform other heuristics at 95% confidence intervals.
机译:随着越来越多的企业和组织将其IT服务外包到分布式云以节省成本,服务生成的历史数据和运营数据呈指数增长。存储在不同地理数据中心的称为大数据的生成数据现在已成为这些企业和组织的宝贵资产,因为它们可以通过分析利用数据来识别业务优势并做出战略决策。因此,大数据分析已成为云计算的主要研究主题。为了在由互联网互连的位于不同地理位置的多个数据中心组成的分布式云中有效地评估大数据分析查询,提出了巨大的挑战:(i)查询的源数据通常位于不同的数据中心; (ii)此时查询的资源需求可能超出任何单个数据中心的能力。在本文中,我们针对分布式云中的大数据分析查询制定了一个在线查询评估问题,目的是在最大化查询接受率的同时最大程度地减少累积查询评估成本,为此,我们首先提出了一种新的度量模型来建模通过合并不同数据中心和链接的容量和工作负载以及不同查询的资源需求,可以在分布式云中使用不同的资源。然后,我们针对无法拆分和可拆分的源数据假设,设计了有效的在线算法来进行查询评估。我们最终通过仿真进行了广泛的实验,以评估所提出算法的性能。实验结果表明,所提出的算法是有前途的,并且在95%置信区间内优于其他启发式算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号