首页> 外文会议>IEEE International Conference on Semantic Computing >Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval
【24h】

Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval

机译:组合词性,术语接近度和查询扩展以进行文档检索

获取原文

摘要

Document retrieval systems recover documents from a database and order them according to their perceived relevance to a user's search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user's literal query and a user's true intentions. The main goal of this study is to modify the Okapi BM25 document retrieval system to improve search results for textual queries and unstructured, textual corpora. This research hypothesizes that Okapi BM25 is not taking full advantage of the structure of text inside documents. This structure holds valuable semantic information that can be used to increase the model's accuracy. Modifications that account for a term's part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation set was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision.
机译:文档检索系统从数据库中恢复文档,并根据它们与用户搜索查询的相关性对文档进行排序。对于机器而言,这是一项艰巨的任务,因为在用户的文字查询中的术语含义与用户的真实意图之间存在语义上的差距。这项研究的主要目的是修改Okapi BM25文档检索系统,以改善文本查询和非结构化文本语料库的搜索结果。该研究假设Okapi BM25没有充分利用文档内部文本的结构。该结构包含有价值的语义信息,可用于提高模型的准确性。修饰词项的词性,一对相关词条之间的接近度,一个词条相对于其在文档中的位置的接近度以及查询扩展都可用于扩充Okapi BM25。该研究产生了87种修改,所有修改均使用开源语料库进行了验证。然后在Lisa语料库下测试了来自验证集的最高评分修改,并且在平均平均精度下进行评估时,该模型比Okapi BM25好10.25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号