MapReduce框架下基于森林存储结构的查询结果共享

石霖; 牛保宁; 张锦文

首页> 中文期刊>科学技术与工程 >MapReduce框架下基于森林存储结构的查询结果共享

MapReduce框架下基于森林存储结构的查询结果共享

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The current large-scale data analysis is usually to execute queries in MapReduce framework. Because of the redundancy of MapReduce framework and overlap among queries, reusing the results of queries can signifi-cantly improve the efficiency of the execution of queries. It is necessary to store the results and match queries, which have significant overhead and offset some of the benefits. To alleviate the problem,ReStore,the state of the art system for reusing query results,as an example,was taken to improve its efficiency. A forest structure for man-aging query results is proposed and a matching algorithm is developed. Both of them can contribute to improving the efficiency of the system and reduce overhead. In order to fully enable the system to reuse the results of executed queries,a preprocessing scheme is proposed,which arranges queries in an order to enter Pig compiler according to their proximity in terms of datasets to be operated,so that the queries operate on the same datasets can be executed in sequence and matching can be localized. Experiments show that the proposed techniques can reduce 16.3% time cost,with a better scaling up factor.%当前的大规模数据分析通常在MapReduce框架下执行查询,由于MapReduce框架本身的冗余性以及查询之间的重叠性,复用已有查询的结果可以大幅提高查询的执行效率.复用查询的结果需要对其进行存储和匹配管理,产生高昂的系统开销,抵消复用的部分效果.针对目前先进的查询结果复用系统ReStore在管理查询结果和匹配中存在的效率低下的问题,提出森林结构的Job存储管理技术和与之相适应的匹配算法,提高查询的匹配效率,减少系统的开销.为了使系统能够充分复用已执行查询的结果,提出对多个查询进行预处理的方案;通过改变各查询进入Pig编译器进行编译的顺序,从而改变Job的执行顺序,使得加载相同数据集的Job同时执行,减少与存储库进行匹配的次数.实验表明,在构建存储结构与匹配已有结果过程中,提出的方法与ReStore相比,节约16.3%的时间开销,伸缩性也更好.

著录项

来源
《科学技术与工程》|2018年第8期|220-227|共8页
作者
石霖; 牛保宁; 张锦文;
展开▼
作者单位

太原理工大学计算机科学与技术学院,太原030024;

太原理工大学计算机科学与技术学院,太原030024;

太原理工大学计算机科学与技术学院,太原030024;

展开▼
原文格式 PDF
正文语种 chi
中图分类检索机;
关键词
MapReduce框架; ReStore系统; 系统开销;

相似文献

中文文献
外文文献
专利

1. 数据广播环境下基于数据共享的位置相关skyline查询 [J] . 程荣峰 ,肖迎元 . 计算机工程与科学 . 2012,第010期
2. 移动环境下基于共享客户信息的空间位置查询 [J] . 孙小培 ,朱玉全 ,陈耿 . 计算机应用 . 2009,第012期
3. B/S环境下基于XML存储结构的医学影像伪三维交互平台的研制 [J] . 杨磊鑫 ,乔梁 ,陈欣 . 中国医疗设备 . 2014,第009期
4. Cache共享架构下的多属性范围查询 [J] . 海沫 ,王秀利 . 小型微型计算机系统 . 2010,第005期
5. 分布式环境下数据共享中的多表查询转换算法 [J] . 邬建锋 ,彭宇行 . 计算机工程 . 2009,第020期
6. 网内查询处理中的一种基于数据流共享的过滤查询算法 [C] . 王潇 ,卢阳 ,陈立军 . NDBC2009第26届中国数据库学术会议 . 2009
7. 基于LFB存储结构的XML压缩查询算法研究与应用 [A] . 欧锋 . 2010

MapReduce框架下基于森林存储结构的查询结果共享

摘要

著录项

相似文献

相关主题

期刊订阅