首页> 外文学位 >Learning to rank with partially-labeled data.
【24h】

Learning to rank with partially-labeled data.

机译:学习使用部分标记的数据进行排名。

获取原文
获取原文并翻译 | 示例

摘要

Ranking is a key problem in many applications. In web search, for instance, webpages are ranked such that the most relevant ones are presented to the user first. In machine translation, a set of hypothesized translations are ranked so that the correct one is chosen. Abstractly, the problem of ranking is to predict an ordering over a set of objects. Given the importance of ranking in many applications, "Learning to Rank" has risen as an active research area, crossing disciplines such as machine learning and information retrieval. The approach is to adapt machine learning techniques developed for classification and regression problems to problems with rank structure. However, so far the majority of research has focused on the supervised learning setting. Supervised learning assumes that the ranking algorithm is provided with labeled data indicating the rankings or permutations of objects. Such labels may be expensive to obtain in practice.;The goal of this dissertation is to investigate the problem of ranking in the framework of semi-supervised learning. Semi-supervised learning assumes that data is only partially labeled, i.e. for some sets of objects, labels are not available. This kind of framework seeks to exploit the potentially vast amount of cheap unlabeled data in order to improve upon supervised learning. While both supervised learning for ranking and semi-supervised learning for classification have become active research themes, the combination, semi-supervised learning for ranking, has been less examined. This thesis aims to fill the gap.;The contribution of this thesis is an examination of several ways to exploit unlabeled data in ranking. In particular, four assumptions commonly used in classification (Change of Representation, Covariate Shift, Low Density Separation, Manifold) are extended to the ranking setting. Their implementations are tested on six real-world datasets from Information Retrieval, Machine Translation, and Computational Biology. The algorithmic contributions of this work include (a) a Local/Transductive meta-algorithm, which allows one to plug in different unlabel data assumptions with relative ease, and (b) a kernel defined on lists, which allow one to extend methods which work with samples (i.e. classification, regression) to methods which work with lists of samples (i.e. ranking). We demonstrate that several assumptions about how unlabeled data helps in classification can be successfully applied to the ranking problem, showing improvements over the supervised baseline under different dataset-method combinations.
机译:排名是许多应用程序中的关键问题。例如,在网络搜索中,对网页进行排名,以便最相关的网页首先显示给用户。在机器翻译中,对一组假设的翻译进行排名,以便选择正确的翻译。抽象地,排名问题是预测一组对象的排序。鉴于排名在许多应用程序中的重要性,“学习排名”已成为一种活跃的研究领域,它跨越了机器学习和信息检索等学科。该方法是使针对分类和回归问题开发的机器学习技术适应等级结构问题。但是,到目前为止,大多数研究都集中在有监督的学习环境上。监督学习假设为排名算法提供了标记数据,这些数据指示对象的排名或排列。这样的标签在实践中可能是昂贵的。;本论文的目的是研究半监督学习框架中的排名问题。半监督学习假设数据仅被部分标记,即,对于某些对象集,标记不可用。这种框架试图利用潜在的大量廉价未标记数据,以改进监督学习。虽然用于分级的监督学习和用于分类的半监督学习都已经成为活跃的研究主题,但用于分级的半监督学习的组合却很少受到研究。本文旨在弥补这一空白。本文的研究是对几种利用未标记数据进行排名的方法的研究。特别是,分类中常用的四个假设(表示变化,协变量移位,低密度分离,歧管)被扩展到排名设置。它们的实现在来自信息检索,机器翻译和计算生物学的六个真实数据集上进行了测试。这项工作的算法贡献包括:(a)本地/转导元算法,该算法允许相对容易地插入不同的未标记数据假设;(b)列表中定义的内核,该内核允许扩展有效的方法样本(即分类,回归)到使用样本列表(即排名)的方法。我们证明了关于无标签数据如何有助于分类的几个假设可以成功地应用于排名问题,显示出在不同数据集方法组合下对监督基线的改进。

著录项

  • 作者

    Duh, Kevin K.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 146 p.
  • 总页数 146
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号