首页> 中文期刊> 《武汉工程大学学报》 >消除规范关系连接冗余的二次排序算法研究

消除规范关系连接冗余的二次排序算法研究

             

摘要

The join results of two entities with normative one-to-many relationship by MapReduce may contain some redundancy of one side entity. A combination key with one side entity properties and multi-side sorted values and a list of multi-side entity properties can be got as the input of reduce stage,by optimizing secondary sort-based algorithm and redefining the partition function of map stage,sort and group function of shuffle stage. After splitting the combination key at reduce stage,the key of one side entity was extracted as rowkey of the HBase table to store the join results,and the other properties of the one side entity and the list containing multi-side entity properties were put in the corresponding columns of the HBase table,so the join semantics was realized and the redundancy was eliminated. The examination proves that the optimized algorithm can eliminate the redundancy of one side entity properties and promote the data query efficiency of the join results.%使用MapReduce框架对规范的一对多关系实体进行连接操作时,一方实体的各个属性会在连接的结果中产生大量冗余.通过对二次排序算法进行优化,重新定义Map阶段的分区过程、Shuffle阶段的排序及分组过程,使得Map阶段的输出为包含一方实体属性值和多方实体排序值的组合键及包含多方实体属性值的集合.Reduce阶段将组合键进行分解,提取一方实体的主码作为HBase表的行健,并将组合键中一方实体的各个属性值及多方实体属性值集合分别写入HBase表中对应的列,从而既实现了连接的语义,又消除了冗余.实验证明,优化后的算法可以消除一方实体属性值在连接结果中的冗余,提高了对连接结果的查询效率.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号