首页> 外文学位 >Learning from and actively selecting pairwise constraints in data science.

【24h】

Learning from and actively selecting pairwise constraints in data science.

机译：从数据科学中学习并积极选择成对约束。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In last decade, we have witnessed a data explosion in computer vision, data mining and bioinformatics. In this BIG DATA era, how to effectively and efficiently analyze the increasing amount of large-scale data set is attracting more and more attention. The increasing scale of datasets leads to the growth of the number of classes that results in the hardness of asking humans to provide the accurate class label for data point. Accordingly, we study another type of human supervision: pairwise constraints which represent the degree of the semantic similarity between pair of data points, and could be relatively easily obtained in large scale scenarios. Based on pairwise constraints, we study two kinds of problems and make some progress: utilizing pairwise constraints for distance metric learning, and actively selecting pairwise constraints for semi-supervised clustering.;First of all, based on pairwise constraints, we present an effective metric learning method based on the max-margin framework, and its kernelized extension. In order to efficiently learn the distance function, we adopt the cutting plane algorithm, and propose an approximation method based on matching pursuit that allows linear-time training even in the kernel case.;However, this learned distance metric is a single Mahalanobis metric like most metric learning methods---it cannot handle heterogeneous data well. And those that learn multiple metrics throughout the feature space have demonstrated superior accuracy, but at a severe cost to computational efficiency. Thus, we consider a new angle on the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation.;Our third contribution looks at fast nearest neighbor retrieval in large-scale dataset with the given learned distance. We propose a novel hashing method that represents data point in the compact binary code to achieve both fast queries and efficient data storage. Different from most of traditional hash methods which focus on learning projection functions, our method focus on studying a new quantization strategy that adaptively assigns varying numbers of bits to different hyperplanes based on their information content, which significantly improves on traditional hashing methods.;Finally, rather than randomly selecting pairwise constraints, we turn to actively select pairwise constraints with assessing how the selected pairwise constraints would improve the final assignment for semi-supervised clustering. We propose a novel online framework for active semi-supervised spectral clustering that selects pairwise constraints as clustering proceeds, based on the principle of uncertainty reduction, and significantly improves performance than randomly sampling.

机译：在过去的十年中，我们目睹了计算机视觉，数据挖掘和生物信息学领域的数据爆炸。在这个BIG DATA时代，如何有效地分析日益增长的大规模数据集正受到越来越多的关注。数据集规模的增加导致类别数量的增长，导致要求人类为数据点提供准确的类别标签的难度。因此，我们研究了另一种人类监督：成对约束，代表成对的数据点之间的语义相似度，并且在大规模场景中相对容易获得。基于成对约束，我们研究了两种问题并取得了一些进展：利用成对约束进行距离度量学习，并主动选择成对约束进行半监督聚类。首先，基于成对约束，提出了一种有效的度量基于max-margin框架的学习方法及其内核扩展。为了有效地学习距离函数，我们采用了切割平面算法，并提出了一种基于匹配追踪的近似方法，即使在内核情况下也可以进行线性时间训练;但是，这种学习的距离度量是单个Mahalanobis度量，例如大多数度量学习方法-它不能很好地处理异构数据。那些在整个特征空间中学习多个指标的人已经证明了卓越的准确性，但是却要付出巨大的计算成本。因此，我们考虑了度量学习问题的新角度，并学习了一个能够在整个特征空间中隐式调整其距离函数的度量。通过使用基于森林的随机分类器来支持距离函数并将绝对成对位置和标准相对位置都合并到表示中，可以完成这种度量自适应;我们的第三项研究是在给定给定条件下大规模大规模数据集的快速最近邻检索学习距离。我们提出了一种新颖的哈希方法，该方法在紧凑的二进制代码中表示数据点，以实现快速查询和有效的数据存储。与大多数专注于学习投影函数的传统散列方法不同，我们的方法着重研究一种新的量化策略，该策略根据信息的不同内容将不同数量的比特自适应地分配给不同的超平面，从而显着改进了传统散列方法。而不是随机选择成对约束，我们通过评估所选成对约束将如何改善半监督聚类的最终分配来主动选择成对约束。我们提出了一种用于主动半监督频谱聚类的新型在线框架，该框架基于不确定性降低的原理在聚类进行时选择成对约束，并且比随机抽样显着提高了性能。

著录项

作者
Xiong, Caiming.;
展开▼
作者单位

State University of New York at Buffalo.;

展开▼
授予单位 State University of New York at Buffalo.;
学科 Artificial Intelligence.
学位 Ph.D.
年度 2014
页码 137 p.
总页数 137
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Pairwise Constraint-Guided Sparse Learning for Feature Selection [J] . Liu Mingxia, Zhang Daoqiang Cybernetics, IEEE Transactions on . 2016,第1期

机译：成对约束引导的稀疏学习特征选择
2. A novel unsupervised learning model for detecting driver genes from pan-cancer data through matrix tri-factorization framework with pairwise similarities constraints [J] . Xi Jianing, Li Ao, Wang Minghui Neurocomputing . 2018,第JUNa28期

机译：通过具有成对相似性约束的矩阵三因子分解框架从泛癌数据中检测驱动基因的新型无监督学习模型
3. Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data [J] . Mahdieh Soleymani Baghshah, Saeed Bagheri Shouraki Pattern Recognition: The Journal of the Pattern Recognition Society . 2010,第8期

机译：使用成对相似性和不相似性约束以及数据的几何结构进行非线性度量学习
4. FIM-Based Pairwise Selection for Active Learning on Imbalanced Datasets [C] . Lixing Chen, Xuemin Tian, Lianfang Cai IEEE International Conference on Systems, Man, and Cybernetics . 2015

机译：基于FIM的成对选择，用于不平衡数据集上的主动学习
5. On Feature Selection, Kernel Learning and Pairwise Constraints for Clustering Analysis. [D] . Zeng, Hong. 2010

机译：关于特征选择，内核学习和成对分析的成对约束。
6. Fault Diagnosis by Multisensor Data: A Data-Driven Approach Based on Spectral Clustering and Pairwise Constraints [O] . Massimo Pacella, Gabriele Papadia 2020

机译：多传感器数据的故障诊断：基于频谱聚类和成对约束的数据驱动方法
7. Deep Learning vs Spectral Clustering into an active clustering with pairwise constraints propagation [O] . Voiron Nicolas, Benoit Alexandre, Lambert Patrick, 2016

机译：深度学习与频谱聚类成具有成对约束传播的主动聚类

Learning from and actively selecting pairwise constraints in data science.

摘要

著录项

相似文献

相关主题

期刊订阅