首页> 外文OA文献 >Conceptual Search and Text Categorization
【2h】

Conceptual Search and Text Categorization

机译:概念搜索和文本分类

摘要

The most fundamental problem in information retrieval is that of interpreting information needs of users, typically expressed in a short query. Using the surface level representation of the query is especially unsatisfactory when the information needs are topic specific such as ``US politics'' or ``Space Science'', that seem to require understanding of what the query mean rather than what it is.We suggest that a newly proposed semantic representation of Words (GabrilovichMa2007) can be used to support Conceptual Search. Namely, it allows retrieving documents on a given topic even when existing keyword-based search approaches fail. The method we develop allows us to categorize and retrieve documents topically on-the-fly, without looking at the data collection ahead of time, without knowing a-priori the topics of interest and without training topic categorization classifiers.We compare our approach experimentally to state-of-the-art IR techniques and to machine learning based text categorization techniques and demonstrate significant improvement in performance. Moreover, as we show, our method is intrinsically adaptable to new text collections and domains.
机译:信息检索中最基本的问题是解释用户的信息需求(通常以简短查询表示)。当信息需求是特定主题(例如``美国政治''或``太空科学'')时,使用查询的表面层表示尤其不能令人满意,这似乎需要了解查询的含义而不是查询的含义。我们建议可以使用新提出的单词语义表示(GabrilovichMa2007)支持概念搜索。即,即使现有的基于关键字的搜索方法失败,它也可以检索有关给定主题的文档。我们开发的方法使我们能够实时对文档进行分类和检索,而无需事先查看数据收集,不了解先验感兴趣的主题并且无需训练主题分类分类器。最新的IR技术以及基于机器学习的文本分类技术,并证明了性能的显着提高。而且,正如我们所展示的,我们的方法本质上适用于新的文本集合和域。

著录项

  • 作者单位
  • 年度 2008
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号