首页> 外文期刊>Cloud Computing, IEEE Transactions on >Efficiently Translating Complex SQL Query to MapReduce Jobflow on Cloud
【24h】

Efficiently Translating Complex SQL Query to MapReduce Jobflow on Cloud

机译:有效地将复杂的SQL查询翻译为MapReduce Jobflow

获取原文
获取原文并翻译 | 示例

摘要

MapReduce is a widely-used programming model in cloud environment for parallel processing large-scale data sets. The combination of the high-level language with a SQL-to-MapReduce translator allows programmers to code using SQL-like declarative language, so that each program can afterwards be complied into a MapReduce jobflow automatically. This way is helpful to narrow the gap between non-professional users and cloud platforms, and thus significantly improve the usability of the cloud. Although a number of translators have been developed, the auto-generated MapReduce programs still suffered from extremely inefficiency. In this paper, we present an efficient Cost-Aware SQL-to-MapReduce Translator (CAT). CAT has two notable features. First, it defines two intra-SQL correlations: Generalized Job Flow Correlation (GJFC) and Input Correlation (IC), based on which a set of looser merging rules are introduced. Thus, both Top-Down (TD) and Bottom-Up (BU) merging strategies are proposed and integrated into CAT simultaneously. Second, it adopts a cost estimation model for MapReduce jobflows to guide the selection of a more efficient MapReduce jobflows auto-generated by TD and BU merging strategies. Finally, comparative experiments on TPC-H benchmark demonstrate the effectiveness and scalability of CAT.
机译:MapReduce是一个广泛使用的云环境编程模型,用于并行处理大规模数据集。高级语言与SQL-to-mapreduce转换器的组合允许程序员使用类似SQL的声明性语言代码,以便之后可以将每个程序自动遵守MapReduce Jobflow。这种方式有助于缩小非专业用户和云平台之间的差距,从而显着提高了云的可用性。虽然已经开发了许多翻译人员,但自动生成的MapReduce程序仍然受到极低效率。在本文中,我们提出了一种有效的<下划线XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> c ost-<下划线XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> a Ware SQL-to-MapReduce<下划线XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> t ranslator(猫)。猫有两个值得注意的功能。首先,它定义了两个intra-SQL相关性:广义作业流相关(GJFC)和输入相关(IC),基于介绍了一组Looser合并规则。因此,提出了自上而下(TD)和自下而上(BU)合并策略并同时集成到CAT中。其次,它采用MapReduce Jobflows的成本估计模型,以指导TD和BU合并策略自动生成的更高效的MapReduce Jobflowows。最后,比较实验 tpc-h 基准展示了猫的有效性和可扩展性。

著录项

  • 来源
    《Cloud Computing, IEEE Transactions on》 |2020年第2期|508-517|共10页
  • 作者单位

    Jiangsu Provincial Key Laboratory of E-Business School of Information Engineering Nanjing University of Finance and Economics Nanjing China;

    School of Computer Science and Engineering Southeast University Nanjing China;

    Jiangsu Provincial Key Laboratory of E-Business School of Information Engineering Nanjing University of Finance and Economics Nanjing China;

    School of Computer Science and Engineering Southeast University Nanjing China;

    Jiangsu Provincial Key Laboratory of E-Business School of Information Engineering Nanjing University of Finance and Economics Nanjing China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Merging; Correlation; Cats; Cloud computing; Structured Query Language; Integrated circuits; Estimation;

    机译:合并;相关;猫;云计算;结构化查询语言;集成电路;估计;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号