首页> 外文期刊>Information Processing & Management >Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval
【24h】

Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

机译:评估基于主题的检索的监督术语加权技术的行为和性能

获取原文
获取原文并翻译 | 示例
           

摘要

Topic-based retrieval is the task of seeking and retrieving material related to a topic of interest. This task involves two subtasks: selecting query terms and ranking the retrieved results. Supervised approaches to assess the importance of a term in a topic or class have demonstrated to be effective for guiding the query-term selection subtask. This article analyzes and evaluates FDD_β, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD_β weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. Previous preliminary studies have demonstrated the potential of FDD_β to identify useful query terms. However, preceding studies have limited the analysis to a single domain represented by a single data set with binary categories and have not compared FDD_β to other recently formulated term-weighting techniques. The contributions of this article are the following: (1) it presents an extensive analysis of the behavior of FDD_β as a function of its adjustable parameter; (2) it compares FDD_β against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it makes a full data set and the full code publicly available to replicate the reported analysis and foster future research in the area. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDD_β is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDD_β offers a useful mechanism to explore different approaches to build complex queries.
机译:基于主题的检索是寻求和检索与感兴趣主题相关的材料的任务。此任务涉及两个子任务:选择查询术语并排列检索到的结果。评估主题或课程中术语重要性的监督方法已经证明是为了指导查询期间选择子任务是有效的。本文分析和评估了FDD_β,这是一种监督术语加权方案,可以应用于基于主题的检索中的查询术语选择。 FDD_β权重术语基于两个因素,代表术语关于给定主题的描述性和辨别权的权力。然后,它通过使用可调节参数来组合这两个因素,该可调参数允许有利于检索的不同方面,例如精度,召回或两者之间的平衡。以前的初步研究表明FDD_β识别有用查询术语的潜力。然而,前面的研究将分析限制对由二进制类别的单个数据组表示的单个域,并且没有将FDD_β与其他最近配制的术语加权技术进行比较。本文的贡献如下:(1)它提出了对FDD_β的行为的广泛分析,作为其可调参数的函数; (2)将FDD_β与十八传统和最先进的加权方案进行比较; (3)它评估通过使用分析方法选择的术语构建的析出查询的性能; (4)它具有完整的数据集和公开的完整代码,以复制报告的分析和促进该地区的未来研究。分析和评估在三个数据集中执行:两个众所周知的文本数据集,即20新闻组和路透社-21578以及新发布的数据集。可以得出结论,尽管其简单性,FDD_β与最先进的方法具有竞争力,并且具有在适应特定任务目标的时刻提供灵活性的重要优势。结果还表明FDD_β提供了一种有用的机制来探索不同方法来构建复杂的查询。

著录项

  • 来源
    《Information Processing & Management》 |2021年第3期|102483.1-102483.17|共17页
  • 作者单位

    Departamento de Ciencias e Ingenieria de la Computacion Universidad National del Sur Institute de Ciencias e Ingenieria de la Computation (UNS-CONICET) Bahia Blanca Argentina;

    Departamento de Economia Universidad National del Sur Instituto de Matematica de Bahia Blanca (UNS-CONICET) Bahia Blanco Argentina;

    Departamento de Economia Universidad National del Sur Instituto de Matematica de Bahia Blanca (UNS-CONICET) Bahia Blanco Argentina;

    Departamento de Ciencias e Ingenieria de la Computacion Universidad National del Sur Institute de Ciencias e Ingenieria de la Computation (UNS-CONICET) Bahia Blanca Argentina;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Term weighting; Variable extraction; Information retrieval; Query-term selection; Topic-based retrieval;

    机译:术语加权;变量提取;信息检索;查询期限;基于主题的检索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号