首页> 外文期刊>Information retrieval >Latent word context model for information retrieval
【24h】

Latent word context model for information retrieval

机译:信息检索的潜在词上下文模型

获取原文
获取原文并翻译 | 示例
           

摘要

The application of word sense disambiguation (WSD) techniques to information retrieval (IR) has yet to provide convincing retrieval results. Major obstacles to effective WSD in IR include coverage and granularity problems of word sense inventories, sparsity of document context, and limited information provided by short queries. In this paper, to alleviate these issues, we propose the construction of latent context models for terms using latent Dirichlet allocation. We propose building one latent context per word, using a well principled representation of local context based on word features. In particular, context words are weighted using a decaying function according to their distance to the target word, which is learnt from data in an unsupervised manner. The resulting latent features are used to discriminate word contexts, so as to constrict query's semantic scope. Consistent and substantial improvements, including on difficult queries, are observed on TREC test collections, and the techniques combines well with blind relevance feedback. Compared to traditional topic modeling, WSD and positional indexing techniques, the proposed retrieval model is more effective and scales well on large-scale collections.
机译:词义消歧(WSD)技术在信息检索(IR)中的应用尚未提供令人信服的检索结果。 IR中有效的WSD的主要障碍包括词义清单的覆盖范围和粒度问题,文档上下文的稀疏性以及简短查询提供的有限信息。在本文中,为了缓解这些问题,我们提出了使用潜在狄利克雷分配为术语建立潜在上下文模型的方法。我们建议使用一个基于单词特征的局部上下文的原则,来为每个单词构建一个潜在上下文。特别地,根据上下文单词到目标单词的距离,使用衰减函数对上下文单词进行加权,以无监督的方式从数据中获知。产生的潜在特征用于区分单词上下文,从而限制查询的语义范围。在TREC测试集合中观察到了一致且实质性的改进,包括对困难查询的改进,并且该技术与盲目的相关性反馈很好地结合在一起。与传统主题建模,WSD和位置索引技术相比,所提出的检索模型更加有效,并且可以在大规模馆藏中很好地扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号