...
首页> 外文期刊>Telecommunication systems: Modeling, Analysis, Design and Management >Parallelizing Multiple Group-by queries using MapReduce: optimization and cost estimation
【24h】

Parallelizing Multiple Group-by queries using MapReduce: optimization and cost estimation

机译:使用MapReduce并行处理多个分组查询:优化和成本估算

获取原文
获取原文并翻译 | 示例
           

摘要

MapReduce is a new parallel programming model initially developed for large-scale web content processing. Multidimensional data analysis applications meet the issues of large scale dataset. The arrival of MapReduce provides a chance to utilize the commodity hardware for massively parallelizing multidimensional data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely, Multiple Group-by query. We firstly discuss the communication cost of MapReduce model, then we give an initial implementation of Multiple Group-by query. After that, we propose an optimized version which addresses and reduce the communication cost. According to the experimental measurements, our optimized version shows a better accelerating ability and a better scalability than the other version. We also formally evaluate our results, and give a set of execution time estimations for both the initial implementation and the optimized one.
机译:MapReduce是最初为大规模Web内容处理而开发的新并行编程模型。多维数据分析应用程序满足大规模数据集的问题。 MapReduce的到来为利用商品硬件大规模并行化多维数据分析应用程序提供了机会。从关系代数运算符到MapReduce程序的转换和优化仍然是一个开放而动态的研究领域。在本文中,我们着重于一种特殊类型的数据分析查询,即多重分组查询。我们首先讨论了MapReduce模型的通信成本,然后给出了多个分组查询的初始实现。之后,我们提出了一个优化版本,可以解决并降低通信成本。根据实验测量,我们的优化版本具有比其他版本更好的加速能力和更好的可伸缩性。我们还正式评估了我们的结果,并给出了初始实施和优化实施的一组执行时间估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号