Hadoop is a distributed computing framework which has been widely used for dealing with huge data. But Hadoop has some disadvantages to process graph data. Because of strong coupling,graph structure data need multiple iterations which may contains several MapReduce computations instead of one MapReduce computation. It costs too much to restart MapReduce job and exists unnecessary transmission for static data in iteration. Propose map side storage strategy based on Hadoop,the static data is stored in map side and finish some related computations with state data. This strategy could reduce whole running time. Experimental results have shown that map side storage strategy spends less time compared with previous strategy through Hadoop platform.%Hadoop是处理海量数据的分布式计算框架,已经得到了广泛的应用。但是Hadoop处理图结构数据存在一些不足。图结构数据的强耦合特性,无法通过一次MapReduce计算得出结果,而是需要迭代计算,甚至一次迭代需要多次Ma-pReduce完成。而重新启动MapReduce作业,开销较大,以及迭代过程中可能存在静态数据的不必要传输。文中在Hadoop的基础之上,提出map端存储的策略,即将静态数据存储在map端,在map端完成静态与动态数据相关的计算,减少了整个迭代计算的总运行时间。通过搭建修改过的Hadoop平台,与改进前迭代方案进行比较,实验结果表明map端存储策略运行时间得到了一定程度的减少。
展开▼