首页> 外文OA文献 >A Study of the Bipartite Ranking Problem in Machine Learning
【2h】

A Study of the Bipartite Ranking Problem in Machine Learning

机译:机器学习中的二部排序问题研究

摘要

The problem of ranking, in which the goal is to learn a real-valued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. A particular setting of interest is the bipartite ranking problem, in which instances come from two categories, positive and negative; the learner is given examples of instances labeled as positive or negative, and the goal is to learn from these examples a ranking function that ranks future positive instances higher than negative ones. This thesis makes four important contributions to the understanding of the bipartite ranking problem.First, we derive large deviation and uniform convergence bounds for the bipartite ranking error, a quantity used to measure the quality of a bipartite ranking function. The large deviation bound serves to bound the expected error of a ranking function in terms of its empirical error on an independent test sample; the uniform convergence bound serves to bound the expected error of a learned ranking function in terms of its empirical error on the training sample from which it is learned. The uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients.Second, we define a model of learnability for bipartite ranking functions, and derive a number of results in this model. In particular, we derive both upper and lower bounds on the sample complexity of learning ranking functions in this model. The upper bound is expressed in terms of the bipartite rank-shatter coefficients. The lower bound is expressed in terms of another new combinatorial parameter that we term the rank dimension.Third, we derive generalization bounds for bipartite ranking algorithms based on the notion of algorithmic stability. Unlike bounds based on uniform convergence, these bounds can be applied also to algorithms that search function classes of unbounded complexity. In particular, we are able to apply our results to obtain generalization bounds for kernel-based ranking algorithms, to which bounds based on uniform convergence are often not applicable.Finally, we demonstrate a practical application of bipartite ranking to a problem in bioinformatics, namely the problem of identifying genes related to a given disease based on microarray data. Our studies on leukemia and colon cancer data sets show very promising results, including the identification of some exciting candidate genes as potential targets for drug development.
机译:排名问题的目标是学习一个实值排名函数,该函数可以在实例空间上进行排名或排序,这种问题最近在机器学习中得到了关注。一个特别令人感兴趣的问题是二分排名问题,在这种情况下,实例来自正面和负面两类。向学习者提供了标记为肯定或否定的实例的示例,目标是从这些示例中学习一种排名函数,该函数将未来的肯定实例的排名高于否定实例。本文为理解二元排名问题做出了四个重要贡献。首先,我们得出了二元排名误差的大偏差和一致收敛范围,该量用于衡量二元排名函数的质量。大偏差范围用于根据独立测试样本上的经验误差来限制排名函数的预期误差;统一收敛边界用于根据已学习的排名函数的预期误差在训练样本的学习样本上的经验误差来对其进行约束。统一收敛边界用一组新的组合参数表示,我们称其为二分排名-破碎系数。其次,我们定义了二分排名函数的可学习性模型,并在该模型中得出了许多结果。特别是,在此模型中,我们得出了学习排名函数的样本复杂度的上限和下限。上限用二分秩-破碎系数表示。下限用另一个我们称为秩维的新组合参数表示。第三,我们基于算法稳定性的概念推导了二部排名算法的推广界。与基于统一收敛的边界不同,这些边界还可应用于搜索无限复杂功能类的算法。特别是,我们能够将我们的结果应用于基于核的排序算法的泛化界线,而基于均匀收敛的界线通常不适用。最后,我们证明了二部排名对生物信息学问题的实际应用,即基于微阵列数据鉴定与特定疾病相关的基因的问题。我们对白血病和结肠癌数据集的研究显示出非常有希望的结果,包括鉴定一些令人兴奋的候选基因作为药物开发的潜在靶标。

著录项

  • 作者

    Agarwal Shivani;

  • 作者单位
  • 年度 2005
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"hu","name":"Hungarian","id":19}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号