首页> 外文期刊>Information retrieval >Which noise affects algorithm robustness for learning to rank
【24h】

Which noise affects algorithm robustness for learning to rank

机译:哪种噪声影响学习排名的算法鲁棒性

获取原文
获取原文并翻译 | 示例
           

摘要

When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise and how to utilize them to guide labeling. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on three typical kinds of learning to rank algorithms (i.e. pointwise, pairwise and listwise methods) and three public data sets (i.e. OHSUMED, TD2003 and MSLR-WEB10K) with different properties. We find that when label noise increases in training data, it is the document pair noise ratio (referred to as pNoise) rather than document noise ratio (referred to as dNoise) that can well explain the performance degradation of a ranking algorithm. We further identify two inherent characteristics of the training data, namely relevance levels and label balance, that have great impact on the variation of pNoise with respect to label noise (i.e. dNoise). According to these above results, we further discuss some guidelines on the labeling strategy to construct robust training data for learning to rank algorithms in practice.
机译:当在实际搜索应用中应用学习对算法进行排名时,带有人类标签的训练数据中的噪声成为不可避免的问题,它将影响算法的性能。先前的工作主要集中在研究噪声如何影响排名算法以及如何设计鲁棒的排名算法。在我们的工作中,我们研究了哪些固有特征使训练数据对标签噪声具有鲁棒性,以及如何利用它们来指导标签。我们工作的动机来自有趣的观察,即相同的排名算法可能对不同数据集上的噪声标签显示出非常不同的敏感性。因此,我们基于三种典型的学习排序算法(即逐点,成对和逐列表方法)和三个具有不同属性的公共数据集(即OHSUMED,TD2003和MSLR-WEB10K)来研究此观察的根本原因。我们发现,当训练数据中的标签噪声增加时,可以很好地解释排名算法的性能下降的是文档对噪声比(称为pNoise),而不是文档噪声比(称为dNoise)。我们进一步确定了训练数据的两个固有特征,即相关性水平和标签平衡,它们对pNoise相对于标签噪声的变化(即dNoise)有很大影响。根据以上结果,我们将进一步讨论有关标记策略的一些准则,以构建可靠的训练数据,以便在实践中学习对算法进行排名。

著录项

  • 来源
    《Information retrieval》 |2015年第3期|215-245|共31页
  • 作者单位

    Chinese Acad Sci, Inst Comp Technol, Beijing, Haidian Distric, Peoples R China;

    Chinese Acad Sci, Inst Comp Technol, Beijing, Haidian Distric, Peoples R China;

    Chinese Acad Sci, Inst Comp Technol, Beijing, Haidian Distric, Peoples R China;

    Chinese Acad Sci, Inst Comp Technol, Beijing, Haidian Distric, Peoples R China;

    Chinese Acad Sci, Inst Comp Technol, Beijing, Haidian Distric, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Learning to rank; Label noise; Robust data;

    机译:学习排名;标签噪声;稳健数据;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号