Learning to rank with partially-labeled data.

机译：学习使用部分标记的数据进行排名。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ranking is a key problem in many applications. In web search, for instance, webpages are ranked such that the most relevant ones are presented to the user first. In machine translation, a set of hypothesized translations are ranked so that the correct one is chosen. Abstractly, the problem of ranking is to predict an ordering over a set of objects. Given the importance of ranking in many applications, "Learning to Rank" has risen as an active research area, crossing disciplines such as machine learning and information retrieval. The approach is to adapt machine learning techniques developed for classification and regression problems to problems with rank structure. However, so far the majority of research has focused on the supervised learning setting. Supervised learning assumes that the ranking algorithm is provided with labeled data indicating the rankings or permutations of objects. Such labels may be expensive to obtain in practice.;The goal of this dissertation is to investigate the problem of ranking in the framework of semi-supervised learning. Semi-supervised learning assumes that data is only partially labeled, i.e. for some sets of objects, labels are not available. This kind of framework seeks to exploit the potentially vast amount of cheap unlabeled data in order to improve upon supervised learning. While both supervised learning for ranking and semi-supervised learning for classification have become active research themes, the combination, semi-supervised learning for ranking, has been less examined. This thesis aims to fill the gap.;The contribution of this thesis is an examination of several ways to exploit unlabeled data in ranking. In particular, four assumptions commonly used in classification (Change of Representation, Covariate Shift, Low Density Separation, Manifold) are extended to the ranking setting. Their implementations are tested on six real-world datasets from Information Retrieval, Machine Translation, and Computational Biology. The algorithmic contributions of this work include (a) a Local/Transductive meta-algorithm, which allows one to plug in different unlabel data assumptions with relative ease, and (b) a kernel defined on lists, which allow one to extend methods which work with samples (i.e. classification, regression) to methods which work with lists of samples (i.e. ranking). We demonstrate that several assumptions about how unlabeled data helps in classification can be successfully applied to the ranking problem, showing improvements over the supervised baseline under different dataset-method combinations.

机译：排名是许多应用程序中的关键问题。例如，在网络搜索中，对网页进行排名，以便最相关的网页首先显示给用户。在机器翻译中，对一组假设的翻译进行排名，以便选择正确的翻译。抽象地，排名问题是预测一组对象的排序。鉴于排名在许多应用程序中的重要性，“学习排名”已成为一种活跃的研究领域，它跨越了机器学习和信息检索等学科。该方法是使针对分类和回归问题开发的机器学习技术适应等级结构问题。但是，到目前为止，大多数研究都集中在有监督的学习环境上。监督学习假设为排名算法提供了标记数据，这些数据指示对象的排名或排列。这样的标签在实践中可能是昂贵的。；本论文的目的是研究半监督学习框架中的排名问题。半监督学习假设数据仅被部分标记，即，对于某些对象集，标记不可用。这种框架试图利用潜在的大量廉价未标记数据，以改进监督学习。虽然用于分级的监督学习和用于分类的半监督学习都已经成为活跃的研究主题，但用于分级的半监督学习的组合却很少受到研究。本文旨在弥补这一空白。本文的研究是对几种利用未标记数据进行排名的方法的研究。特别是，分类中常用的四个假设（表示变化，协变量移位，低密度分离，歧管）被扩展到排名设置。它们的实现在来自信息检索，机器翻译和计算生物学的六个真实数据集上进行了测试。这项工作的算法贡献包括：（a）本地/转导元算法，该算法允许相对容易地插入不同的未标记数据假设；（b）列表中定义的内核，该内核允许扩展有效的方法样本（即分类，回归）到使用样本列表（即排名）的方法。我们证明了关于无标签数据如何有助于分类的几个假设可以成功地应用于排名问题，显示出在不同数据集方法组合下对监督基线的改进。

著录项

作者
Duh, Kevin K.;
展开▼
作者单位

University of Washington.;

展开▼
授予单位 University of Washington.;
学科 Engineering Electronics and Electrical.;Computer Science.;Artificial Intelligence.
学位 Ph.D.
年度 2009
页码 146 p.
总页数 146
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Interval estimation for rank correlation coefficients based on the probit transformation with extension to measurement error correction of correlated ranked data. [J] . Rosner B, Glynn RJ Statistics in medicine . 2007,第3期

机译：基于概率变换的秩相关系数的区间估计，并扩展到相关秩数据的测量误差校正。
2. ERR.Rank: An algorithm based on learning to rank for direct optimization of Expected Reciprocal Rank [J] . Ghanbari Elham, Shakery Azadeh Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2019,第3期

机译：err.rank：一种基于学习的算法，以直接优化预期的互惠级别
3. Common Scale Valuations across Different Preference-Based Measures: Estimation Using Rank Data. [J] . Mónica Hernández Alava, John Brazier, Donna Rowen, Medical decision making: An international journal of the Society for Medical Decision Making . 2013,第6期

机译：跨不同基于偏好的量度的通用量表评估：使用等级数据进行估算。
4. Incremental Learning to Rank with Partially-Labeled Data [C] . Kye-Hyeon Kim, Seungjin Choi Workshop on web search click data 2009 . 2009

机译：增量学习对部分标签数据进行排名
5. Active learning with partially-labeled data to reduce classification loss. [D] . Aminian, Minoo. 2006

机译：主动学习带有部分标记的数据，以减少分类损失。
6. DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank [O] . Qingjun Yuan, Junning Gao, Dongliang Wu, -1

机译：DrugE-Rank：通过整体学习排名提高对新候选药物或靶标的药物-靶标相互作用的预测
7. Incremental Learning to Rank with Partially-Labeled Data [O] . Kye-hyeon Kim, Seungjin Choi 2013

机译：使用部分标记数据进行增量学习
8. Locally Most Powerful Rank Tests for Multiple-Censored Data. [R] . Mehrotra, K. G., Johnson, R. A., Bhattacharyya, G. K. 1976

机译：多截断数据的本地最强大的排名测试。

Learning to rank with partially-labeled data.

摘要

著录项

相似文献

相关主题

期刊订阅