【24h】

A Competitive Term Selection Method for Information Retrieval

机译:一种竞争性的信息检索术语选择方法

获取原文
获取原文并翻译 | 示例

摘要

Term selection process is a very necessary component for most natural language processing tasks. Although different unsupervised techniques have been proposed, the best results are obtained with a high computational cost, for instance, those based on the use of entropy. The aim of this paper is to propose an unsupervised term selection technique based on the use of a bigram-enriched version of the transition point. Our approach reduces the corpus vocabulary size by using the transition point technique and, thereafter, it expands the reduced corpus with bigrams obtained from the same corpus, i.e., without external knowledge sources. This approach provides a considerable dimensionality reduction of the TREC-5 collection and, also has shown to improve precision for some entropy-based methods.
机译:术语选择过程是大多数自然语言处理任务中非常必要的组成部分。尽管已提出了不同的无监督技术,但以较高的计算成本可以获得最佳结果,例如,基于熵的计算。本文的目的是基于过渡点的二元形式丰富的形式,提出一种无监督的术语选择技术。我们的方法通过使用过渡点技术来减小语料库的词汇量,然后,它使用从相同语料库获得的二元词来扩展精简语料库,即无需外部知识资源。这种方法大大减少了TREC-5集合的维数,并且对于某些基于熵的方法也显示出提高的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号