首页> 外文会议>International Conference on String Processing and Information Retrieval >Metric Indexes for Approximate String Matching in a Dictionary
【24h】

Metric Indexes for Approximate String Matching in a Dictionary

机译:用于字典中匹配的近似字符串的度量索引

获取原文

摘要

We consider the problem of finding all approximate occurrences of a given string q, with at most k differences, in a finite database or dictionary of strings. The strings can be e.g. natural language words, such as the vocabulary of some document or set of documents. This has many important application in both off-line (indexed) and on-line string matching. More precisely, we have a universe U of strings, and a non-negative distance function d: U x U → N. The distance function is metric, if it satisfies (ⅰ) d(x, y) = 0 <=> x = y; (ⅱ) d(x, y) = d(y, x); (ⅲ) d(x, y) ≤ d(x, z) + d(z, y). The last item is called the "triangular inequality", and is the most important property in our case. Many useful distance functions are known to be metric, in particular edit (Levenshtein) distance is metric, which we will use for d.
机译:我们考虑找到给定字符串Q的所有近似出现的问题,具有大多数k差异,在有限的数据库或字符串字典中。字符串可以是例如自然语言单词,如某些文档或一组文件的词汇。这在离线(索引)和在线字符串匹配中具有许多重要应用。更确切地说,我们有一个字符串U宇宙,以及一个非负距离功能d:u x u→n。距离功能是公制,如果满足(o)d(x,y)= 0 <=> x = y; (Ⅱ)D(x,y)= d(y,x); (Ⅲ)D(x,y)≤d(x,z)+ d(z,y)。最后一项称为“三角不等式”,是我们案件中最重要的财产。已知许多有用的距离函数是指标,特别是编辑(Levenshtein)距离是指标,我们将用于D.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号