首页> 美国卫生研究院文献>Computational Intelligence and Neuroscience >A Fast Clustering Algorithm for Data with a Few Labeled Instances
【2h】

A Fast Clustering Algorithm for Data with a Few Labeled Instances

机译:具有少量标记实例的数据的快速聚类算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.
机译:集群的直径是同一集群内的成对实例之间的最大集群内距离,集群的分裂是集群内的实例与集群外的实例之间的最小距离。给定一些标记的实例,本文包括两个方面。首先,我们提出一种具有以下特性的简单而快速的聚类算法:如果最佳解决方案的最小分裂与最大直径(RSD)之比大于1,则该算法将针对三个聚类标准返回最佳解决方案。其次,我们研究度量学习问题:学习距离度量以使RSD尽可能大。与现有的度量学习算法相比,我们的一种度量学习算法在计算效率方面高:它是一个线性规划模型,而不是大多数现有算法所使用的半定规划模型。我们通过经验证明,监督和学习的指标可以提高聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号