首页> 中文期刊>计算机技术与发展 >Hadoop迭代优化技术的研究

Hadoop迭代优化技术的研究

     

摘要

Hadoop is a distributed computing framework which has been widely used for dealing with huge data. But Hadoop has some disadvantages to process graph data. Because of strong coupling,graph structure data need multiple iterations which may contains several MapReduce computations instead of one MapReduce computation. It costs too much to restart MapReduce job and exists unnecessary transmission for static data in iteration. Propose map side storage strategy based on Hadoop,the static data is stored in map side and finish some related computations with state data. This strategy could reduce whole running time. Experimental results have shown that map side storage strategy spends less time compared with previous strategy through Hadoop platform.%Hadoop是处理海量数据的分布式计算框架,已经得到了广泛的应用。但是Hadoop处理图结构数据存在一些不足。图结构数据的强耦合特性,无法通过一次MapReduce计算得出结果,而是需要迭代计算,甚至一次迭代需要多次Ma-pReduce完成。而重新启动MapReduce作业,开销较大,以及迭代过程中可能存在静态数据的不必要传输。文中在Hadoop的基础之上,提出map端存储的策略,即将静态数据存储在map端,在map端完成静态与动态数据相关的计算,减少了整个迭代计算的总运行时间。通过搭建修改过的Hadoop平台,与改进前迭代方案进行比较,实验结果表明map端存储策略运行时间得到了一定程度的减少。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号