A Study of the Bipartite Ranking Problem in Machine Learning

机译：机器学习中的二部排序问题研究

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of ranking, in which the goal is to learn a real-valued ranking function that induces a ranking or ordering over an instance space, has recently gained attention in machine learning. A particular setting of interest is the bipartite ranking problem, in which instances come from two categories, positive and negative; the learner is given examples of instances labeled as positive or negative, and the goal is to learn from these examples a ranking function that ranks future positive instances higher than negative ones. This thesis makes four important contributions to the understanding of the bipartite ranking problem.First, we derive large deviation and uniform convergence bounds for the bipartite ranking error, a quantity used to measure the quality of a bipartite ranking function. The large deviation bound serves to bound the expected error of a ranking function in terms of its empirical error on an independent test sample; the uniform convergence bound serves to bound the expected error of a learned ranking function in terms of its empirical error on the training sample from which it is learned. The uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rank-shatter coefficients.Second, we define a model of learnability for bipartite ranking functions, and derive a number of results in this model. In particular, we derive both upper and lower bounds on the sample complexity of learning ranking functions in this model. The upper bound is expressed in terms of the bipartite rank-shatter coefficients. The lower bound is expressed in terms of another new combinatorial parameter that we term the rank dimension.Third, we derive generalization bounds for bipartite ranking algorithms based on the notion of algorithmic stability. Unlike bounds based on uniform convergence, these bounds can be applied also to algorithms that search function classes of unbounded complexity. In particular, we are able to apply our results to obtain generalization bounds for kernel-based ranking algorithms, to which bounds based on uniform convergence are often not applicable.Finally, we demonstrate a practical application of bipartite ranking to a problem in bioinformatics, namely the problem of identifying genes related to a given disease based on microarray data. Our studies on leukemia and colon cancer data sets show very promising results, including the identification of some exciting candidate genes as potential targets for drug development.

机译：排名问题的目标是学习一个实值排名函数，该函数可以在实例空间上进行排名或排序，这种问题最近在机器学习中得到了关注。一个特别令人感兴趣的问题是二分排名问题，在这种情况下，实例来自正面和负面两类。向学习者提供了标记为肯定或否定的实例的示例，目标是从这些示例中学习一种排名函数，该函数将未来的肯定实例的排名高于否定实例。本文为理解二元排名问题做出了四个重要贡献。首先，我们得出了二元排名误差的大偏差和一致收敛范围，该量用于衡量二元排名函数的质量。大偏差范围用于根据独立测试样本上的经验误差来限制排名函数的预期误差；统一收敛边界用于根据已学习的排名函数的预期误差在训练样本的学习样本上的经验误差来对其进行约束。统一收敛边界用一组新的组合参数表示，我们称其为二分排名-破碎系数。其次，我们定义了二分排名函数的可学习性模型，并在该模型中得出了许多结果。特别是，在此模型中，我们得出了学习排名函数的样本复杂度的上限和下限。上限用二分秩-破碎系数表示。下限用另一个我们称为秩维的新组合参数表示。第三，我们基于算法稳定性的概念推导了二部排名算法的推广界。与基于统一收敛的边界不同，这些边界还可应用于搜索无限复杂功能类的算法。特别是，我们能够将我们的结果应用于基于核的排序算法的泛化界线，而基于均匀收敛的界线通常不适用。最后，我们证明了二部排名对生物信息学问题的实际应用，即基于微阵列数据鉴定与特定疾病相关的基因的问题。我们对白血病和结肠癌数据集的研究显示出非常有希望的结果，包括鉴定一些令人兴奋的候选基因作为药物开发的潜在靶标。

著录项

作者
Agarwal Shivani;
展开▼
作者单位

展开▼
年度 2005
总页数
原文格式 PDF
正文语种 {"code":"hu","name":"Hungarian","id":19}
中图分类

相似文献

外文文献
中文文献
专利

1. Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study [J] . Toktam Khatibi, Elham Hanifi, Mohammad Mehdi Sepehri, BMC Pregnancy and Childbirth . 2021,第1期

机译：提出基于机器学习的方法，以在交付之前和在交付之前和排名中的特征：全国回顾性横断面研究
2. On Theoretically Optimal Ranking Functions in Bipartite Ranking [J] . Uematsu Kazuki, Lee Yoonkyung Journal of the American statistical association . 2017,第519期

机译：二分排名理论上的最优排名函数
3. Adaptive partitioning schemes for bipartite ranking How to grow and prune a ranking tree [J] . Stephan Clemencon, Marine Depecker, Nicolas Vayatis Machine Learning . 2011,第1期

机译：双向划分的自适应分区方案如何生长和修剪排名树
4. Minimax Learning Rates for Bipartite Ranking and Plug-in Rules [C] . Stephan Clemencon, Sylvain Robbiano International Conference on Machine Learning . 2011

机译：二分排名和插件规则的最低限度学习率
5. A study of the bipartite ranking problem in machine learning [D] . Agarwal, Shivani 2005

机译：机器学习中的二等排名问题研究
6. Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study [O] . Toktam Khatibi, Elham Hanifi, Mohammad Mehdi Sepehri, 2021

机译：提出基于机器学习的方法来预测交付前和排名和排名的分解：全国回顾性横断面研究
7. Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study [O] . Toktam Khatibi, Elham Hanifi, Mohammad Mehdi Sepehri, 2021

机译：提出基于机器学习的方法来预测交付前和排名和排名的分解：全国回顾性横断面研究

A Study of the Bipartite Ranking Problem in Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅