首页> 外文会议>International Conference on Mechatronics and Information Technology >An improved inverted index model and its retrieval algorithm
【24h】

An improved inverted index model and its retrieval algorithm

机译:改进的倒指数模型及其检索算法

获取原文

摘要

The traditional inverted index scheme has some deficiencies owing to its only covering the word terms' frequency and positions in documents, but not covering the space sequences of the word terms in the documents' structures. This paper developed an improved inverted index scheme, which combined the paragraph sequences, sentence sequences and word sequences as a list to replace the posting list in the traditional inverted index. And the algorithm of similarity calculation and text retrieval based on this improved inverted index scheme was given. The similarity is the result of the traditional similarity multiplying paragraph sequence similarity coefficient, sentence sequence similarity coefficient, and words sequence similarity coefficient, which can denote as SimNew(D,Q) = Sim(D,Q)~* Ceof_P ~* Ceof_s~* Ceof_w. By calculating similarity, the documents can be ranked as retrieval results. As an experiment, some documents selected from the search results of Google was reranked by similarity calculated with this algorithm. The result of the experiment shows that this algorithm is helpful for users to retrieve information which can match the users' queries much more.
机译:由于其唯一涵盖文档中的单词术语频率和位置,但不覆盖文档结构中的单词术语的空间序列,传统的倒数指数方案具有一些不足之处。本文开发了一种改进的倒置指数方案,将段序列,句子序列和单词序列组合为列表以替换传统反相索引中的发布列表。给出了基于这种改进的倒置索引方案的相似性计算和文本检索算法。相似性是传统相似性乘法段序列相似度系数,句子序列相似系数的结果,以及单词序列相似度系数,可以表示为simnew(d,q)= sim(d,q)〜* ceof_p〜* ceof_s〜 * ceof_w。通过计算相似性,文档可以作为检索结果排名。作为实验,从Google的搜索结果中选择的一些文件被使用该算法计算的相似性重新命名。实验结果表明,该算法有助于用户检索可以匹配用户查询的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号