首页> 外文学位 >A classification approach to the automatic reformulation of Boolean queries in information retrieval.
【24h】

A classification approach to the automatic reformulation of Boolean queries in information retrieval.

机译:在信息检索中自动重新构造布尔查询的分类方法。

获取原文
获取原文并翻译 | 示例

摘要

One of difficulties in using the current Boolean-based information retrieval systems is that it is hard for a user, especially a novice, to formulate an effective Boolean query. Users usually employ a trial-and-error search and often have to rely on an expert (e.g., librarian) in searching for the right information. Query reformulation can be even more difficult and complex than formulation since the user has greater difficulty in incorporating the new information gained from the previous search into his next query. In this research, query reformulation is viewed as a classification problem classifying documents as either relevant or nonrelevant), and a new reformulation algorithm is proposed which builds a tree-structured classifier (named the query tree) at each reformulation from a set of feedback documents retrieved from the previous search; the query tree can be easily transformed into a Boolean query.; To compare the performance of the new approach and past Boolean query reformulation algorithms, an evaluation testbed was developed. Its major component is a simulated database which is characterized by the term frequency distributions and an artificial Boolean query; the relevance of documents to a query is judged by the system. The query tree and two of the most important current query reformulation algorithms were compared on benchmark test sets (CACM, CISI, and Medlars) and in an evaluation testbed. The query tree showed significant improvements over the current algorithms in most experiments. We attribute this improved performance to the ability of the query tree algorithm to select good search terms and to represent the relationships among search terms into a tree structure.
机译:使用当前的基于布尔的信息检索系统的困难之一是用户,特别是新手很难制定有效的布尔查询。用户通常采用反复试验的搜索方式,并且常常不得不依靠专家(例如,图书馆员)来搜索正确的信息。由于用户将从先前搜索中获得的新信息整合到他的下一个查询中的难度更大,因此与公式化相比,查询的重构甚至更加困难和复杂。在这项研究中,查询重构被视为将文档分类为相关或不相关的分类问题,并且提出了一种新的重构算法,该算法从一组反馈文档中的每次重构中构建树结构分类器(称为查询树)。从上一次搜索中检索到;查询树可以很容易地转换成布尔查询。为了比较新方法和过去的布尔查询重新格式化算法的性能,开发了一个评估测试平台。它的主要组成部分是一个模拟数据库,其特征在于频率分布和人工布尔查询。文件与查询的相关性由系统判断。在基准测试集(CACM,CISI和Medlars)和评估测试床上,比较了查询树和两个最重要的当前查询重新构造算法。在大多数实验中,查询树显示了对当前算法的显着改进。我们将这种改进的性能归因于查询树算法选择良好搜索词并将搜索词之间的关系表示为树结构的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号