首页> 中文期刊> 《模式识别与人工智能》 >BMGSJoin:一种基于MapReduce的图相似度连接算法

BMGSJoin:一种基于MapReduce的图相似度连接算法

         

摘要

图相似度连接在数据挖掘领域应用广泛,尤其是在数据预处理阶段,可用于数据清理、近复本检测等,其研究具有十分重要的意义。针对基于编辑距离约束的图相似度连接问题进行研究,返回两个图集合中所有编辑距离不超过给定阈值的图对。基于分布式编程框架MapReduce,设计采用“过滤-验证”框架的MGSJoin算法,利用基于路径的q-gram签名实现非解候选对的过滤,计数过滤。鉴于该算法键值对数量庞大的潜在问题,引入Bloom Filter技术对算法进行改进并设计BMGSJoin算法。实验结果表明,提出的两种图相似度连接算法能较大地改善现有算法的效率和可扩展性,并能较好地应对当前大数据挖掘分析的需求。%Graph similarity join has extensive use in the field of data mining, especially in data pre-processing, it could be applied to data cleaning, near duplicate detection, etc. Thus, it is of great importance to study graph similarity join. Graph similarit join based on edit distance constraints is studied, that is, all the edit distances in the return pair of graphs are no larger than a given threshold. Based on MapReduce programming model, an algorithm named MGSJoin is proposed with the〞filtering-verification〞framework, and it relies on graph signatures of path-based q-grams for filtering out non-promising candidates, i. e. count filtering. With the potential issue of too many key-value pairs, Bloom Filter is introduced to improve the algorithm and BMGSJoin is designed. The improvement of efficiency and scalability by the proposed algorithm is demonstrated by extensive experimental results, and it may meet the current challenges of big data mining and analysis.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号