首页> 外文会议>Mexican conference on pattern recognition >Improving Information Retrieval Through a Global Term Weighting Scheme
【24h】

Improving Information Retrieval Through a Global Term Weighting Scheme

机译:通过全球术语加权方案改善信息检索

获取原文

摘要

The output of an information retrieval system is an ordered list of documents corresponding to the user query, represented by an input list of terms. This output relies on the estimated similarity between each document and the query. This similarity depends in turn on the weighting scheme used for the terms of the document index. Term weighting then plays a big role in the estimation of the aforementioned similarity. This paper proposes a new term weighting approach for information retrieval based on the marginal frequencies. Consisting of the global count of term frequencies over the corpus of documents, while conventional term weighting schemes such as the normalized term frequency takes into account the term frequencies for particular documents. The presented experiment shows the advantages and disadvantages of the proposed retrieval scheme. Performance measures such as precision and recall and F-Score are used over classical benchmarks such as CACM to validate the experimental results.
机译:信息检索系统的输出是与用户查询相对应的文档的有序列表,由术语输入列表表示。此输出依赖于每个文档与查询之间的估计相似度。这种相似性又取决于用于文档索引项的加权方案。因此,术语加权在上述相似性的估计中起着重要作用。本文提出了一种基于边际频率的信息检索新术语加权方法。由文档语料库中术语频率的全局计数组成,而常规术语加权方案(例如归一化术语频率)考虑了特定文档的术语频率。提出的实验表明了所提出的检索方案的优缺点。在诸如CACM之类的经典基准上使用诸如精度和召回率以及F-Score之类的性能指标来验证实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号