首页> 中文期刊>科学技术与工程 >MapReduce框架下基于森林存储结构的查询结果共享

MapReduce框架下基于森林存储结构的查询结果共享

     

摘要

The current large-scale data analysis is usually to execute queries in MapReduce framework. Because of the redundancy of MapReduce framework and overlap among queries, reusing the results of queries can signifi-cantly improve the efficiency of the execution of queries. It is necessary to store the results and match queries, which have significant overhead and offset some of the benefits. To alleviate the problem,ReStore,the state of the art system for reusing query results,as an example,was taken to improve its efficiency. A forest structure for man-aging query results is proposed and a matching algorithm is developed. Both of them can contribute to improving the efficiency of the system and reduce overhead. In order to fully enable the system to reuse the results of executed queries,a preprocessing scheme is proposed,which arranges queries in an order to enter Pig compiler according to their proximity in terms of datasets to be operated,so that the queries operate on the same datasets can be executed in sequence and matching can be localized. Experiments show that the proposed techniques can reduce 16.3% time cost,with a better scaling up factor.%当前的大规模数据分析通常在MapReduce框架下执行查询,由于MapReduce框架本身的冗余性以及查询之间的重叠性,复用已有查询的结果可以大幅提高查询的执行效率.复用查询的结果需要对其进行存储和匹配管理,产生高昂的系统开销,抵消复用的部分效果.针对目前先进的查询结果复用系统ReStore在管理查询结果和匹配中存在的效率低下的问题,提出森林结构的Job存储管理技术和与之相适应的匹配算法,提高查询的匹配效率,减少系统的开销.为了使系统能够充分复用已执行查询的结果,提出对多个查询进行预处理的方案;通过改变各查询进入Pig编译器进行编译的顺序,从而改变Job的执行顺序,使得加载相同数据集的Job同时执行,减少与存储库进行匹配的次数.实验表明,在构建存储结构与匹配已有结果过程中,提出的方法与ReStore相比,节约16.3%的时间开销,伸缩性也更好.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号