首页> 外文期刊>Frontiers of computer science in China >String similarity search and join: a survey
【24h】

String similarity search and join: a survey

机译:字符串相似性搜索和连接:调查

获取原文
获取原文并翻译 | 示例
           

摘要

String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been extensively studied in the recent decade, there is no thorough survey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for string similarity search and join. We also discuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Finally, we provide some open datasets and summarize some research challenges and open problems.
机译:字符串相似性搜索和联接是数据清理和集成中的两个重要操作,它们通过容忍数据中的错误和不一致来扩展数据库中的传统精确搜索和精确联接操作。他们有许多实际应用程序,例如拼写检查,重复检测,实体解析和网页群集。尽管在最近十年中对这两个问题进行了广泛的研究,但还没有进行彻底的调查。在本文中,我们对字符串相似性搜索和连接进行了全面的调查。我们首先给出问题的定义,并介绍广泛使用的相似度函数以量化相似度。然后,我们提出了用于字符串相似性搜索和连接的大量算法。我们还将讨论它们的变体,包括近似实体提取,预输入搜索和近似子字符串匹配。最后,我们提供了一些开放数据集并总结了一些研究挑战和开放问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号