首页> 外文期刊>ACM transactions on database systems >Boosting the Quality of Approximate String Matching by Synonyms
【24h】

Boosting the Quality of Approximate String Matching by Synonyms

机译:通过同义词提高近似字符串匹配的质量

获取原文
获取原文并翻译 | 示例

摘要

A string-similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings "Sam" and "Samuel" can be considered to be similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, for example, number of common words or q-grams. While this is indeed an indicator of similarity, there are many important cases where syntactically-different strings can represent the same real-world object. For example, "Bill" is a short form of "William," and "Database Management Systems" can be abbreviated as "DBMS." Given a collection of predefined synonyms, the purpose of this article is to explore such existing knowledge to effectively evaluate the similarity between two strings and efficiently perform similarity searches and joins, thereby boosting the quality of approximate string matching.
机译:字符串相似性度量可量化两个文本字符串之间的相似性,以进行近似字符串匹配或比较。例如,字符串“ Sam”和“ Samuel”可以被认为是相似的。现有的大多数计算两个字符串的相似度的工作都只考虑语法相似性,例如,常见单词或q-gram的数量。尽管这确实表明了相似性,但在许多重要的情况下,语法上不同的字符串可以表示相同的真实世界对象。例如,“帐单”是“威廉”的缩写,“数据库管理系统”可以缩写为“ DBMS”。给定一组预定义的同义词,本文的目的是探索这些现有知识,以有效评估两个字符串之间的相似性,并有效地执行相似性搜索和连接,从而提高近似字符串匹配的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号