...
首页> 外文期刊>ScientificWorldJournal >Efficient and Scalable Graph Similarity Joins in MapReduce
【24h】

Efficient and Scalable Graph Similarity Joins in MapReduce

机译:MapReduce中的高效和可伸缩的图形相似性连接

获取原文

摘要

Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we proposeMGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
机译:随着巨大的图形建模数据的出现,对图表相似度加入的巨大的应用具有重要意义,由于它们的广泛应用程序,包括多种目的,包括数据清洁以及近重复检测。本文考虑了图表相似性与编辑距离约束连接,返回图表对,使其编辑距离不大于给定阈值。利用MapReeduce编程模型,WE Proposemgsjoin,筛选验证框架后的可扩展算法,用于有效的图形相似性连接。它依赖于计数重叠的图形签名来过滤出非妥协的候选者。在过滤阶段中的潜在问题的潜在问题,引入了光谱绽放过滤器以减少键值对的数量。此外,我们集成了多道连接策略来提高验证,其中提出了一种用于GED计算的MapReduce的方法。通过广泛的实验结果证明了所提出的算法的卓越效率和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号