首页> 外文学位 >Exploitation of unlabeled data and related tasks in semi-supervised learning.
【24h】

Exploitation of unlabeled data and related tasks in semi-supervised learning.

机译:在半监督学习中利用未标记的数据和相关任务。

获取原文
获取原文并翻译 | 示例

摘要

Supervised learning has proven an effective technique for learning a classifier when there is enough labeled data. Unfortunately, in many applications, a generous provision of labeled data is often not available due to the high cost of labeling a datum. Supervised algorithms are known to generalize poorly when there is a limited number of labeled data. There has been much recent work on semi-supervised learning and multitask learning; both try to improve the generalization of classifiers based on using information sources beyond the labeled data.;In this thesis, we design two semi-supervised algorithms, termed as parameterized neighborhood-based classification (PNBC) and label iteration, that efficiently explore the data manifold information provided by both the labeled data and unlabeled data, to improve generalization. The PNBC represents the probability of label at a given data point by mixing over all data points in a neighborhood, which is formed via a Markov random walk over the entire data manifold. The label iteration is a very simple algorithm, which has a closed-form solution in the limit. Experimental results demonstrate the effectiveness of both algorithms. Based on PNBC, we further propose an efficient active learning procedure for the unexploded ordnance (UXO) detection problem, employing the mutual-information criterion.;With PNBC as a building block, we make the first attempt to integrate the benefits offered both by semi-supervised learning and multitask learning (MTL), by proposing semi-supervised multitask learning. In the semi-supervised MTL setting, we have M partially labeled data manifolds, each defining a classification task and involving design of a PNBC classifier. The M PNBC classifiers are designed simultaneously within a unified sharing structure. The superior performance of semi-supervised MTL on real sensing applications demonstrates that both manifold information and the information from related tasks could play positive and complementary roles in real applications, suggesting that one can find significant benefits in practice by performing semi-supervised MTL.
机译:当有足够的标记数据时,监督学习已被证明是一种学习分类器的有效技术。不幸的是,由于标注数据的高昂成本,在许多应用中,通常无法提供大量的标注数据。当标记数据数量有限时,已知监督算法的泛化能力很差。最近有很多关于半监督学习和多任务学习的工作。两者都试图通过利用标记数据以外的信息源来提高分类器的泛化能力。本文设计了两种半监督算法,分别称为基于参数化邻域分类(PNBC)和标签迭代,可以有效地探索数据。标记数据和未标记数据同时提供的多种信息,以提高通用性。 PNBC通过在附近的所有数据点上混合来表示给定数据点处的标记概率,这是通过整个数据流形上的马尔可夫随机游走形成的。标签迭代是一种非常简单的算法,其极限值具有封闭形式的解决方案。实验结果证明了两种算法的有效性。在PNBC的基础上,我们进一步提出了一种有效的主动学习程序,利用相互信息标准对未爆炸弹药(UXO)检测问题进行了研究;;以PNBC为基础,我们首次尝试将半成品所提供的好处整合在一起通过提出半监督多任务学习来实现监督学习和多任务学习(MTL)。在半监督MTL设置中,我们有M个带有部分标签的数据流形,每个流形定义一个分类任务并涉及PNBC分类器的设计。 M PNBC分类器在统一的共享结构中同时设计。半监督MTL在实际感测应用程序中的优越性能表明,多种信息和来自相关任务的信息都可以在实际应用程序中发挥积极和互补的作用,这表明通过执行半监督MTL可以在实践中找到显着的收益。

著录项

  • 作者

    Liu, Qiuhua.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 98 p.
  • 总页数 98
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号