首页> 外文期刊>Information Processing & Management >Quality versus efficiency in document scoring with learning-to-rank models
【24h】

Quality versus efficiency in document scoring with learning-to-rank models

机译:学习等级模型在文档评分中的质量与效率

获取原文
获取原文并翻译 | 示例
       

摘要

Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications - e.g., it directly impacts on response time and throughput of Web query processing - it has received relatively little attention so far.The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking models exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features.We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state-of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded implementations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (Ωλ-MART), an algorithm that induces forests of oblivious regression trees.We investigate how the different training parameters impact on the quality versus efficiency trade-off, and provide a thorough comparison of several algorithms in the quality-cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.
机译:学习到排名(LtR)技术利用机器学习算法和大量的训练数据来诱导高质量的排名功能。给定一组文档和一个用户查询,这些功能能够准确预测每个文档的分数,进而用于对它们进行有效排名。尽管LtR模型的评分效率在一些应用中至关重要-例如,它直接影响Web查询处理的响应时间和吞吐量-到目前为止,它的关注度相对较小。该工作的目的是通过实验研究LtR模型的评分效率LtR模型及其排名质量。具体来说,我们证明了机器学习的排名模型表现出质量与效率之间的权衡。例如,每个LtR算法系列都具有可影响有效性和效率的调整参数,通常使用更复杂和更昂贵的模型可获得更高的排名质量。此外,学习复杂模型(例如基于回归树森林的模型)的LtR算法通常比其他诱导更简单模型(例如特征的线性组合)的算法更昂贵,更有效。我们广泛分析了质量与效率之间的权衡范围广泛的最新LtR,我们提出了一种合理的方法来设计在给定的时间预算下最有效的排名。为确保可重复性,我们使用了公开可用的数据集,并且我们贡献了一个开放源C ++框架,该框架为最有效的基于树的学习者提供了优化的多线程实现:梯度提升回归树(GBRT),Lambda-Mart(λ-MART),以及Oblivious Lambda-Mart(Ωλ-MART)的第一个公共领域实施方案,该算法可生成遗忘的回归树森林。我们研究了不同的训练参数如何影响质量与效率之间的权衡,并进行了全面的比较质量成本空间中的几种算法。进行的实验表明,没有一个总体上最佳的算法,但是最佳选择取决于时间预算。

著录项

  • 来源
    《Information Processing & Management》 |2016年第6期|1161-1177|共17页
  • 作者单位

    Innovation Design och Teknik (IDT), Maelardalens hoegskola, Vaesteras, Sweden;

    Istituto di Scienza e Tecnologie dell'Informazione (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy and Istella Srl,Cagliari, Italy;

    Istituto di Scienza e Tecnologie dell'Informazione (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy and Istella Srl,Cagliari, Italy;

    University Ca'Foscari of Venice, Italy;

    Istituto di Scienza e Tecnologie dell'Informazione (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy and Istella Srl,Cagliari, Italy;

    Istituto di Scienza e Tecnologie dell'Informazione (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy and Istella Srl,Cagliari, Italy;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Efficiency; Learning-to-rank; Document scoring;

    机译:效率;学习排名;文件评分;
  • 入库时间 2022-08-17 23:20:12

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号