首页> 外文学位 >A Study of Document-Context Models in Information Retrieval.
【24h】

A Study of Document-Context Models in Information Retrieval.

机译:信息检索中的文档上下文模型研究。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis we study new retrieval models which simulate the "local" relevance decision-making for every term location in a document, these local relevance decisions are then combined as the "document-wide" relevance decision for the document. Local relevance decision for a term t occurred at the k-th location in a document is made by considering the document-context which is the window of terms centred at the term t at the k-th location. Therefore, different relevance scores (preferences) are obtained for the same term t at different locations in a document depending on its document-contexts. This differs from traditional models which term t receives the same score disregard of its locations in a document.;A hybrid document-context model is studied which is the combination of various existing effective models and techniques. It estimates the relevance decision preference of document-contexts as the log-odds and combines the estimated preferences using different types of aggregation operators that comply with the relevance decision principles. The model is evaluated using retrospective experiments to reveal the potential of the model. Besides retrospective experiments, we also use top 20 documents from the initial ranked list to perform relevance feedback experiments with a probabilistic document-context model and the results are promising.;We also show that when the size of the document-contexts is shrunk to unity, the document-context model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights. Thus TF-IDF term weights can be interpreted as making relevance decisions. This helps to establish a unifying perspective about information retrieval as relevance decision-making and to develop advance TF-IDF-related term weights for future elaborate retrieval models.;Lastly, we develop a new relevance feedback algorithm by splitting the ranked document list into multiple lists of document-contexts. The judgement of relevance of the documents is not done sequentially. This is called active feedback and we show that our new relevance feedback algorithm obtained better results than the conventional relevance feedback algorithm and this is done more reliably than a maximal marginal relevance (MMR) method which does not use document-contexts.
机译:在本文中,我们研究了新的检索模型,这些模型模拟了文档中每个术语位置的“本地”相关性决策,然后将这些本地相关性决策合并为文档的“整个文档”相关性决策。通过考虑文档上下文来确定在文档中第k个位置出现的术语t的局部相关性,该上下文是在t处位于第k个位置的术语窗口。因此,根据文档上下文,在文档中不同位置获得的相同术语t的相关性得分(偏好)不同。这与传统模型不同,传统模型不考虑文档在文档中的位置而获得相同的分数。研究了混合文档-上下文模型,该模型是各种现有有效模型和技术的结合。它以对数形式估计文档上下文的相关性决策优先级,并使用符合相关性决策原则的不同类型的聚合运算符来组合估计的优先级。使用回顾性实验评估模型,以揭示模型的潜力。除回顾性实验外,我们还使用初始排名列表中的前20个文档通过概率文档-上下文模型进行相关性反馈实验,并且结果令人鼓舞。我们还表明,当文档上下文的大小缩小到统一时,则将文档上下文模型简化为直接与TF-IDF术语权重相对应的基本排名公式。因此,TF-IDF术语权重可以解释为做出相关性决策。这有助于建立有关信息检索作为相关决策的统一观点,并为将来的详细检索模型开发与TF-IDF相关的高级术语权重。最后,我们通过将已排序的文档列表分为多个来开发新的相关性反馈算法。文档上下文列表。文件的相关性判断不是顺序进行的。这被称为主动反馈,我们证明了我们的新的相关性反馈算法比常规的相关性反馈算法获得了更好的结果,并且比不使用文档上下文的最大边际相关性(MMR)方法更可靠地完成了此操作。

著录项

  • 作者

    Wu, Ho Chung.;

  • 作者单位

    Hong Kong Polytechnic University (Hong Kong).;

  • 授予单位 Hong Kong Polytechnic University (Hong Kong).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 166 p.
  • 总页数 166
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号