Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval

机译：组合词性，术语接近度和查询扩展以进行文档检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Document retrieval systems recover documents from a database and order them according to their perceived relevance to a user's search query. This is a difficult task for machines to accomplish because there exists a semantic gap between the meaning of the terms in a user's literal query and a user's true intentions. The main goal of this study is to modify the Okapi BM25 document retrieval system to improve search results for textual queries and unstructured, textual corpora. This research hypothesizes that Okapi BM25 is not taking full advantage of the structure of text inside documents. This structure holds valuable semantic information that can be used to increase the model's accuracy. Modifications that account for a term's part of speech, the proximity between a pair of related terms, the proximity of a term with respect to its location in a document, and query expansion are used to augment Okapi BM25. The study resulted in 87 modifications which were all validated using open source corpora. The top scoring modification from the validation set was then tested under the Lisa corpus and the model performed 10.25% better than Okapi BM25 when evaluated under mean average precision.

机译：文档检索系统从数据库中恢复文档，并根据它们与用户搜索查询的相关性对文档进行排序。对于机器而言，这是一项艰巨的任务，因为在用户的文字查询中的术语含义与用户的真实意图之间存在语义上的差距。这项研究的主要目的是修改Okapi BM25文档检索系统，以改善文本查询和非结构化文本语料库的搜索结果。该研究假设Okapi BM25没有充分利用文档内部文本的结构。该结构包含有价值的语义信息，可用于提高模型的准确性。修饰词项的词性，一对相关词条之间的接近度，一个词条相对于其在文档中的位置的接近度以及查询扩展都可用于扩充Okapi BM25。该研究产生了87种修改，所有修改均使用开源语料库进行了验证。然后在Lisa语料库下测试了来自验证集的最高评分修改，并且在平均平均精度下进行评估时，该模型比Okapi BM25好10.25％。

著录项

来源
《IEEE International Conference on Semantic Computing》|2019年|150-153|共4页
会议地点
作者
Eric LaBouve; Lubomir Stanchev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Mathematical model; Semantics; Google; Training data; Benchmark testing; Search engines;

机译：数学模型;语义学; Google;训练数据;基准测试;搜索引擎;
入库时间 2022-08-26 13:53:16

相似文献

外文文献
中文文献
专利

1. Query Expansion for Document Retrieval by Mining Additional Query Terms [J] . Hsi-Ching Lin, Li-Hui Wang, Shyi-Ming Chen International Journal of Information and Management Sciences . 2008,第1期

机译：通过挖掘其他查询词扩展文档检索的查询
2. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix — Pursuit of Enhanced Informational Search on the Web — [J] . Etsuro FUJITA, Keizo OYAMA IEICE transactions on information and systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索-追求增强的Web信息搜索能力-
3. Efficient Top-k Document Retrieval for Long Queries Using Term-Document Binary Matrix: Pursuit of Enhanced Informational Search on the Web [J] . Etsuro Fujita, Keizo Oyama IEICE Transactions on Information and Systems . 2013,第5期

机译：使用术语文档二进制矩阵对长查询进行有效的Top-k文档检索：追求增强的Web信息搜索
4. Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval [C] . Eric LaBouve, Lubomir Stanchev IEEE International Conference on Semantic Computing . 2019

机译：结合语音，术语接近度和查询扩展的部分，以获取文档检索
5. The Ensemble MeSH-Term Query Expansion Models Using Multiple LDA Topic Models and ANN Classifiers in Health Information Retrieval [D] . You, Sukjin. 2020

机译：使用多个LDA主题模型和健康信息检索的ANN分类器的集合网格术语查询型号
6. Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge [O] . Theodore B Wright, David Ball, William Hersh 2017

机译：使用MeSH术语进行数据集检索的查询扩展：OHSU在bioCADDIE 2016数据集检索挑战中
7. Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval [O] . Eric LaBouve, Lubomir Stanchev 2019

机译：结合语音，术语接近度和查询扩展的部分，以获取文档检索
8. Retrieval Effects of Query Expansion on a Feedback Document Retrieval System [R] . Smeaton, A. F. 1982

机译：查询扩展对反馈文献检索系统的检索效果

Combining Parts of Speech, Term Proximity, and Query Expansion for Document Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅