【24h】

A New Language Model Combining Single and Compound Terms

机译:结合单词和复合词的新语言模型

获取原文

摘要

Most traditional information retrieval systems are based on single terms indexing. However, it is admitted that semantic content of a document (or a query) cannot be accurately captured by a simple set of independent keywords. Although, several works have incorporated phrases or other syntactic information in IR, such attempts have shown slight benefit, at best. Particularly in language modeling approaches this is achieved through the use of the big ram or n-gram models. However, in these models all big rams-grams are considered and weighted uniformly. In this paper we introduce a new approach to weight and consider only certain types of N-grams "compound terms". Experimental results on three test collections showed an improvement.
机译:大多数传统的信息检索系统都基于单项索引。但是,可以接受的是,文档(或查询)的语义内容无法通过一组简单的独立关键字来准确地捕获。尽管有几篇著作在IR中加入了短语或其他句法信息,但这种尝试充其量只是显示出一点好处。特别是在语言建模方法中,这是通过使用大公羊或n元语法模型来实现的。但是,在这些模型中,所有大ram / n-gram均被考虑并统一加权。在本文中,我们介绍了一种权重的新方法,并且仅考虑某些类型的N-gram“复合项”。在三个测试集合上的实验结果显示出了改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号