首页> 外文学位 >Learning from and actively selecting pairwise constraints in data science.
【24h】

Learning from and actively selecting pairwise constraints in data science.

机译:从数据科学中学习并积极选择成对约束。

获取原文
获取原文并翻译 | 示例

摘要

In last decade, we have witnessed a data explosion in computer vision, data mining and bioinformatics. In this BIG DATA era, how to effectively and efficiently analyze the increasing amount of large-scale data set is attracting more and more attention. The increasing scale of datasets leads to the growth of the number of classes that results in the hardness of asking humans to provide the accurate class label for data point. Accordingly, we study another type of human supervision: pairwise constraints which represent the degree of the semantic similarity between pair of data points, and could be relatively easily obtained in large scale scenarios. Based on pairwise constraints, we study two kinds of problems and make some progress: utilizing pairwise constraints for distance metric learning, and actively selecting pairwise constraints for semi-supervised clustering.;First of all, based on pairwise constraints, we present an effective metric learning method based on the max-margin framework, and its kernelized extension. In order to efficiently learn the distance function, we adopt the cutting plane algorithm, and propose an approximation method based on matching pursuit that allows linear-time training even in the kernel case.;However, this learned distance metric is a single Mahalanobis metric like most metric learning methods---it cannot handle heterogeneous data well. And those that learn multiple metrics throughout the feature space have demonstrated superior accuracy, but at a severe cost to computational efficiency. Thus, we consider a new angle on the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation.;Our third contribution looks at fast nearest neighbor retrieval in large-scale dataset with the given learned distance. We propose a novel hashing method that represents data point in the compact binary code to achieve both fast queries and efficient data storage. Different from most of traditional hash methods which focus on learning projection functions, our method focus on studying a new quantization strategy that adaptively assigns varying numbers of bits to different hyperplanes based on their information content, which significantly improves on traditional hashing methods.;Finally, rather than randomly selecting pairwise constraints, we turn to actively select pairwise constraints with assessing how the selected pairwise constraints would improve the final assignment for semi-supervised clustering. We propose a novel online framework for active semi-supervised spectral clustering that selects pairwise constraints as clustering proceeds, based on the principle of uncertainty reduction, and significantly improves performance than randomly sampling.
机译:在过去的十年中,我们目睹了计算机视觉,数据挖掘和生物信息学领域的数据爆炸。在这个BIG DATA时代,如何有效地分析日益增长的大规模数据集正受到越来越多的关注。数据集规模的增加导致类别数量的增长,导致要求人类为数据点提供准确的类别标签的难度。因此,我们研究了另一种人类监督:成对约束,代表成对的数据点之间的语义相似度,并且在大规模场景中相对容易获得。基于成对约束,我们研究了两种问题并取得了一些进展:利用成对约束进行距离度量学习,并主动选择成对约束进行半监督聚类。首先,基于成对约束,提出了一种有效的度量基于max-margin框架的学习方法及其内核扩展。为了有效地学习距离函数,我们采用了切割平面算法,并提出了一种基于匹配追踪的近似方法,即使在内核情况下也可以进行线性时间训练;但是,这种学习的距离度量是单个Mahalanobis度量,例如大多数度量学习方法-它不能很好地处理异构数据。那些在整个特征空间中学习多个指标的人已经证明了卓越的准确性,但是却要付出巨大的计算成本。因此,我们考虑了度量学习问题的新角度,并学习了一个能够在整个特征空间中隐式调整其距离函数的度量。通过使用基于森林的随机分类器来支持距离函数并将绝对成对位置和标准相对位置都合并到表示中,可以完成这种度量自适应;我们的第三项研究是在给定给定条件下大规模大规模数据集的快速最近邻检索学习距离。我们提出了一种新颖的哈希方法,该方法在紧凑的二进制代码中表示数据点,以实现快速查询和有效的数据存储。与大多数专注于学习投影函数的传统散列方法不同,我们的方法着重研究一种新的量化策略,该策略根据信息的不同内容将不同数量的比特自适应地分配给不同的超平面,从而显着改进了传统散列方法。而不是随机选择成对约束,我们通过评估所选成对约束将如何改善半监督聚类的最终分配来主动选择成对约束。我们提出了一种用于主动半监督频谱聚类的新型在线框架,该框架基于不确定性降低的原理在聚类进行时选择成对约束,并且比随机抽样显着提高了性能。

著录项

  • 作者

    Xiong, Caiming.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号