首页> 外文会议>International ACM SIGIR conference on research development in information retrieval >Improving Retrieval of Short Texts Through Document Expansion
【24h】

Improving Retrieval of Short Texts Through Document Expansion

机译:通过文档扩展改进对短文本的检索

获取原文

摘要

Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.
机译:包含大量简短文档的收藏越来越普遍。随着这些藏书的数量和规模的增长,提供对简短文本的有效检索带来了重大的研究问题。我们提出了一种新颖的方法,可以基于积极的文档扩展来改善短文本的信息检索(IR)。从短文档往往只涉及一个主题的假设开始,我们以伪查询的形式提交文档,并分析结果以了解文档本身。在这种情况下,文档扩展会有所帮助,因为简短的文档几乎不会产生术语频率信息。但是,正如我们所展示的,所提出的技术不仅可以帮助我们对词汇属性进行建模,而且还可以对文档的时间属性进行建模。我们使用一个微博(Twitter)数据语料库和一个联合数字图书馆的元数据记录语料库提供实验结果。关于已建立的基线,这些实验的结果表明,应用我们提出的文档扩展方法可以显着提高有效性。具体来说,我们的方法改进了文档的词汇表示形式,并提高了时间影响检索的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号