首页> 外文会议>Database systems for advanced applications >Hash~(ed)-Join: Approximate String Similarity Join with Hashing
【24h】

Hash~(ed)-Join: Approximate String Similarity Join with Hashing

机译:Hash〜(ed)-Join:近似字符串相似性与散列连接

获取原文
获取原文并翻译 | 示例

摘要

The string similarity join, which finds similar string pairs from string sets, has received extensive attention in database and information retrieval fields. To this problem, the filter-and-refine framework is usually adopted by the existing research work, and various filtering methods have been proposed. Recently, tree based index techniques with the edit distance constraint are effectively employed for evaluating the string similarity join. However, they do not scale well with large distance threshold. In this paper, we propose an approach for approximate string similarity join based on Min-Hashing locality sensitive hashing and trie-based index techniques. Our approach is flexible between trading the efficiency and performance. Empirical study using the real datasets demonstrates that our framework is more efficient and scales better.
机译:从字符串集中找到相似的字符串对的字符串相似性连接在数据库和信息检索领域受到了广泛的关注。针对这个问题,现有的研究工作通常采用过滤和细化框架,并提出了各种过滤方法。最近,具有编辑距离约束的基于树的索引技术被有效地用于评估字符串相似性连接。但是,它们在距离阈值较大时无法很好地缩放。在本文中,我们提出了一种基于Min-Hashing局部敏感哈希和基于Trie的索引技术的近似字符串相似性连接方法。我们的方法在效率和绩效之间保持灵活。使用真实数据集进行的经验研究表明,我们的框架更有效,扩展性更好。

著录项

  • 来源
  • 会议地点 Bali(ID)
  • 作者单位

    College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China;

    School of Computer Science, Fudan University, Shanghai 200433, China;

    School of Computer Science, Fudan University, Shanghai 200433, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号