首页> 外文OA文献 >Kernel-Based Ranking. Methods for Learning and Performance Estimation
【2h】

Kernel-Based Ranking. Methods for Learning and Performance Estimation

机译:基于内核的排名。学习和绩效评估方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Machine learning provides tools for automated construction of predictivemodels in data intensive areas of engineering and science. The family ofregularized kernel methods have in the recent years become one of the mainstreamapproaches to machine learning, due to a number of advantages themethods share. The approach provides theoretically well-founded solutionsto the problems of under- and overfitting, allows learning from structureddata, and has been empirically demonstrated to yield high predictive performanceon a wide range of application domains. Historically, the problemsof classification and regression have gained the majority of attention in thefield. In this thesis we focus on another type of learning problem, that oflearning to rank.In learning to rank, the aim is from a set of past observations to learna ranking function that can order new objects according to how well theymatch some underlying criterion of goodness. As an important special caseof the setting, we can recover the bipartite ranking problem, correspondingto maximizing the area under the ROC curve (AUC) in binary classification.Ranking applications appear in a large variety of settings, examplesencountered in this thesis include document retrieval in web search, recommendersystems, information extraction and automated parsing of naturallanguage. We consider the pairwise approach to learning to rank, whereranking models are learned by minimizing the expected probability of rankingany two randomly drawn test examples incorrectly. The developmentof computationally efficient kernel methods, based on this approach, has inthe past proven to be challenging. Moreover, it is not clear what techniquesfor estimating the predictive performance of learned models are the mostreliable in the ranking setting, and how the techniques can be implementedefficiently.The contributions of this thesis are as follows. First, we developRankRLS, a computationally efficient kernel method for learning to rank,that is based on minimizing a regularized pairwise least-squares loss. Inaddition to training methods, we introduce a variety of algorithms for taskssuch as model selection, multi-output learning, and cross-validation, basedon computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm,which is one of the most well established methods for learning torank. Third, we study the combination of the empirical kernel map and reducedset approximation, which allows the large-scale training of kernel machinesusing linear solvers, and propose computationally efficient solutionsto cross-validation when using the approach. Next, we explore the problemof reliable cross-validation when using AUC as a performance criterion,through an extensive simulation study. We demonstrate that the proposedleave-pair-out cross-validation approach leads to more reliable performanceestimation than commonly used alternative approaches. Finally, we presenta case study on applying machine learning to information extraction frombiomedical literature, which combines several of the approaches consideredin the thesis. The thesis is divided into two parts. Part I provides the backgroundfor the research work and summarizes the most central results, PartII consists of the five original research articles that are the main contributionof this thesis.
机译:机器学习提供了用于在工程和科学的数据密集型领域中自动构建预测模型的工具。由于方法共享的许多优点,近年来,正规化内核方法家族已成为机器学习的主流方法之一。该方法为欠拟合和过拟合问题提供了理论上有根据的解决方案,可以从结构化数据中学习,并且已通过经验证明在广泛的应用领域中具有很高的预测性能。从历史上看,分类和回归问题一直是该领域关注的焦点。在本文中,我们关注另一种学习问题,即学习排名。在学习排名中,目标是从一组过去的观察结果中学习一种排名函数,该函数可以根据新对象与某些基本善良标准的匹配程度对新对象进行排序。作为设置的一个重要特例,我们可以恢复二元排序问题,从而在二进制分类中最大化ROC曲线下的面积。排名应用出现在各种各样的设置中,本文所涉及的示例包括在Web中进行文档检索搜索,推荐系统,信息提取和自然语言的自动解析。我们考虑成对学习排名的方法,其中通过最小化对两个随机绘制的测试示例进行不正确排名的预期概率来学习排名模型。在过去,基于这种方法的计算有效的内核方法的开发被证明是具有挑战性的。此外,尚不清楚哪种用于估计学习模型的预测性能的技术在排名设置中最可靠,以及如何有效地实施这些技术。本文的贡献如下。首先,我们开发RankRLS,这是一种用于计算排名的计算有效内核方法,它基于最小化规则化的成对最小二乘损失。除训练方法外,我们还基于矩阵代数的计算捷径,针对模型选择,多输出学习和交叉验证引入了各种算法。其次,我们针对RankSVM算法的线性版本改进了已知最快的训练方法,这是最先进的学习排名方法之一。第三,我们研究了经验核映射与减少集近似的组合,这允许使用线性求解器对核机器进行大规模训练,并提出了使用计算方法进行交叉验证的有效计算解决方案。接下来,我们将通过广泛的仿真研究,探讨将AUC用作性能标准时可靠的交叉验证问题。我们证明,与常用的替代方法相比,提出的叶子配对出交叉验证方法可导致更可靠的性能估计。最后,我们提出了一个将机器学习应用于生物医学文献信息提取的案例研究,该研究结合了本文中考虑的几种方法。论文分为两个部分。第一部分为研究工作提供了背景,并总结了最主要的成果。第二部分由五篇原创的研究论文组成,这是本论文的主要贡献。

著录项

  • 作者

    Airola Antti;

  • 作者单位
  • 年度 2011
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号