...
首页> 外文期刊>IEEE transactions on systems, man, and cybernetics. Part B >Fast retrieval of electronic messages that contain mistyped words or spelling errors
【24h】

Fast retrieval of electronic messages that contain mistyped words or spelling errors

机译:快速检索包含错误键入的单词或拼写错误的电子消息

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an index structure for retrieving electronic messages that contain mistyped words or spelling errors. Given a query string (e.g., a search key), we want to find those messages that approximately contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the messages. Our approach is to store the messages sequentially in a database and hash their "fingerprints" into a number of "fingerprint files." When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages. We derive a lower bound, based on which one can prune a large number of nonqualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bound.
机译:本文提出了一种索引结构,用于检索包含错误键入的单词或拼写错误的电子消息。给定查询字符串(例如搜索关键字),我们希望找到那些大致包含查询的消息,即在将查询与消息中的单词(或词组)匹配时允许某些插入,删除和不匹配。我们的方法是将消息顺序存储在数据库中,并将其“指纹”散列为多个“指纹文件”。发出查询后,其指纹也将散列到文件中,并在邮件上构建投票的直方图。我们推导出一个下限,根据该下限,在搜索过程中可以修剪大量不合格的消息(即那些票数低于下限的消息)。本文提出了一些实验结果,证明了索引结构和下限的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号