首页> 中文期刊>计算机工程与应用 >基于Hadoop的多关键字排序方法研究

基于Hadoop的多关键字排序方法研究

     

摘要

It takes a long time to sort big data by multi-keywords with single machine. In order to improve the efficiency of sorting, two methods of multi-keywords sort are given according to MapReduce model of Hadoop. In method one, chain radix sort algorithm is used by Reduce function to sort big data by multi-keywords in parallel, which can improve the efficiency of sorting with multiple nodes. In method two, composite key and comparator are defined, which imple-ments multi-keywords comparison between records by byte so that it can save more time on deserializing objects. The per-formance of the two methods is tested by experiments. The experimental results show that the two methods can achieve high sorting efficiency and good scalability.%在单机环境下按多关键字对大数据排序需要较长的执行时间,为了提高按多关键字对大数据排序的效率,根据Hadoop的MapReduce模型,给出了两种基于Hadoop的多关键字排序方法。方法一在Reduce函数中使用链式基数排序算法按多关键字对大数据并行排序,利用多个节点的计算能力提高排序的效率。方法二通过定义组合键和比较器实现了对记录的多个关键字按字节比较,节省了将字节流反序列化为对象的时间。通过实验测试了两种方法的性能,实验结果表明,两种方法均能取得较高的排序效率和较好的可扩展性。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号