首页> 外文期刊>Expert Systems with Application >A kernel semi-supervised distance metric learning with relative distance: Integration with a MOO approach
【24h】

A kernel semi-supervised distance metric learning with relative distance: Integration with a MOO approach

机译:相对距离的内核半监督距离度量学习:与MOO方法集成

获取原文
获取原文并翻译 | 示例
       

摘要

Metric learning, which aims to determine an appropriate distance function to measure the similarity and dissimilarity between data points accurately, is one of the most popular methods to enhance the performance of many machine learning methods such as K-means clustering and K nearest neighbor classifier algorithms. These algorithms may not perform well because of the use of normal Euclidean distance function that ignores any statistical regularities that might be estimated from a large training set of labeled examples. In many real-world applications, the Euclidean distance may not be fit to capture the intrinsic similarity and dissimilarity between the data points. Compared to existing metric learning algorithms, which use large amount of labeled data in the form of must-link (ML) and cannot-link constraints as side information where the granularity of true clustering is unknown, our proposed approach uses few labeled data in the form of relative-distance constraints such as equality constraints, C-eq, and inequality constraints, C-neq. For satisfying such constraints, we need to project the initial Euclidean distance matrix by using Bregman projection on the convex subset of constraints in such a way that all the constraints are satisfied. Since Bregman projection is not orthogonal, means while satisfying the current constraint previously satisfied constraints may get unsatisfied, we need to select a proper subset of constraints for learning better distance function. The multi-objective framework is utilized for selecting a good subset of constraints which can help in getting the proper labeling of the data set. The selected subset of constraints is used for adjusting the initial kernel-matrix. K-means clustering technique is applied to the adjusted kernel matrix to label the data set. In order to evaluate the quality of obtained labeling, different external and internal cluster validity indices are deployed. The values of these indices are simultaneously optimized using the search capability of MOO with the aim of selecting the appropriate subset of constraints. The proposed approach is evaluated on UCI Human Activity Recognition using Smartphone Dataset v1.0 along with nine other popular data sets. Results show that our approach outperforms the state of the art semi-supervised metric learning algorithms with respect to different internal and external cluster validity indices. (C) 2018 Published by Elsevier Ltd.
机译:度量学习旨在确定合适的距离函数以准确测量数据点之间的相似度和相似度,是提高许多机器学习方法(例如K均值聚类和K最近邻分类器算法)性能的最受欢迎的方法之一。由于使用正常的欧几里德距离函数,这些算法可能无法很好地执行,而该函数会忽略可能从大量带有标签的示例训练集中估计的任何统计规律。在许多实际应用中,欧几里得距离可能不适合捕获数据点之间的固有相似性和不相似性。与现有的度量学习算法相比,该算法使用大量必须链接(ML)形式的标记数据和不能链接约束作为边信息,而真正的聚类的粒度未知,因此我们提出的方法很少使用标记数据。相对距离约束的形式,例如等式约束C-eq和不等式约束C-neq。为了满足这样的约束,我们需要通过在约束的凸子集上使用Bregman投影来投影初始欧几里得距离矩阵,以使所有约束都得到满足。由于Bregman投影不是正交的,这意味着在满足当前约束的同时可能无法满足先前满足的约束,我们需要选择合适的约束子集以学习更好的距离函数。多目标框架用于选择约束的良好子集,这有助于获得正确的数据集标记。选择的约束子集用于调整初始内核矩阵。将K均值聚类技术应用于调整后的核矩阵以标记数据集。为了评估获得的标签的质量,部署了不同的内部和外部群集有效性指标。使用MOO的搜索功能可同时优化这些索引的值,以选择适当的约束子集。使用智能手机数据集v1.0和其他九种流行数据集对UCI人类活动识别进行了评估。结果表明,相对于不同的内部和外部集群有效性指标,我们的方法优于现有的半监督度量学习算法。 (C)2018由Elsevier Ltd.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号