首页> 外文会议>International Conference on String Processing and Information Retrieval >Extending Weighting Models with a Term Quality Measure
【24h】

Extending Weighting Models with a Term Quality Measure

机译:通过术语质量措施扩展加权模型

获取原文

摘要

Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from the removal of stopwords, there is no other consideration of the quality of words that are being 'weighted'. It is often assumed that term frequency is a good indicator for a decision to be made as to how relevant a document is to a query. Our intuition is that raw term frequency could be enhanced to better discriminate between terms. To do so, we propose using non-lexical features to predict the 'quality' of words, before they are weighted for retrieval. Specifically, we show how parts of speech (e.g. nouns, verbs) can help estimate how informative a word generally is, regardless of its relevance to a query/document. Experimental results with two standard TREC collections show that integrating the proposed term quality to two established weighting models enhances retrieval performance, over a baseline that uses the original weighting models, at all times.
机译:加权模型使用词汇统计(例如术语频率)来导出术语权重,用于估计文档对查询的相关性。除了删除秒表之外,还没有其他考虑“加权”的词语质量。通常假设术语频率是一个良好指标,用于决定如何相关文档对查询进行查询。我们的直觉是可以提高原始术语频率以更好地区分术语。为此,我们建议使用非词汇特征来预测单词的“质量”,然后在加权检索之前。具体而言,我们展示了如何演讲(例如名词,动词)的部分如何帮助估计一般的信息,无论与查询/文档相关如何。具有两个标准TREC集合的实验结果表明,将所提出的术语质量集成到两种既定的加权模型,增强了使用原始加权模型的基线进行检索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号