【24h】

An improved inverted index model and its retrieval algorithm

机译:改进的倒排索引模型及其检索算法

获取原文
获取原文并翻译 | 示例

摘要

The traditional inverted index scheme has some deficiencies owing to its only covering the word terms' frequency and positions in documents, but not covering the space sequences of the word terms in the documents' structures. This paper developed an improved inverted index scheme, which combined the paragraph sequences, sentence sequences and word sequences as a list to replace the posting list in the traditional inverted index. And the algorithm of similarity calculation and text retrieval based on this improved inverted index scheme was given. The similarity is the result of the traditional similarity multiplying paragraph sequence similarity coefficient, sentence sequence similarity coefficient, and words sequence similarity coefficient, which can denote as SimNew(D,Q) = Sim(D,Q)~* Ceof_P ~* Ceof_s~* Ceof_w. By calculating similarity, the documents can be ranked as retrieval results. As an experiment, some documents selected from the search results of Google was reranked by similarity calculated with this algorithm. The result of the experiment shows that this algorithm is helpful for users to retrieve information which can match the users' queries much more.
机译:传统的倒排索引方案由于仅覆盖单词词在文档中的频率和位置而没有覆盖文档结构中单词词的空间序列,因此存在一些缺陷。本文提出了一种改进的倒排索引方案,该方案将段落序列,句子序列和单词序列组合为列表,以取代传统倒排索引中的发布列表。给出了基于改进的倒排索引方案的相似度计算和文本检索算法。相似度是传统相似度乘以段落序列相似度系数,句子序列相似度系数和单词序列相似度系数的结果,可以表示为SimNew(D,Q)= Sim(D,Q)〜* Ceof_P〜* Ceof_s〜 * Ceof_w。通过计算相似度,可以将文档排名为检索结果。作为实验,通过此算法计算出的相似度对从Google搜索结果中选择的一些文档进行了排名。实验结果表明,该算法对用户检索与用户查询更加匹配的信息很有帮助。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号