首页> 外文期刊>ACM transactions on knowledge discovery from data >Smart Multitask Bregman Clustering and Multitask Kernel Clustering
【24h】

Smart Multitask Bregman Clustering and Multitask Kernel Clustering

机译:智能多任务Bregman集群和多任务内核集群

获取原文
获取原文并翻译 | 示例

摘要

Traditional clustering algorithms deal with a single clustering task on a single dataset. However, there are many related tasks in the real world, which motivates multitask clustering. Recently some multitask clustering algorithms have been proposed, and among them multitask Bregman clustering (MBC) is a very applicable method. MBC alternatively updates clusters and learns relationships between clusters of different tasks, and the two phases boost each other. However, the boosting does not always have positive effects on improving the clustering performance, it may also cause negative effects. Another issue of MBC is that it cannot deal with nonlinear separable data. In this article, we show that in MBC, the process of using cluster relationship to boost the cluster updating phase may cause negative effects, that is, cluster centroids may be skewed under some conditions. We propose a smart multitask Bregman clustering (S-MBC) algorithm which can identify the negative effects of the boosting and avoid the negative effects if they occur. We then propose a multitask kernel clustering (MKC) framework for nonlinear separable data by using a similar framework like MBC in the kernel space. We also propose a specific optimization method, which is quite different from that of MBC, to implement the MKC framework. Since MKC can also cause negative effects like MBC, we further extend the framework of MKC to a smart multitask kernel clustering (S-MKC) framework in a similar way that S-MBC is extended from MBC. We conduct experiments on 10 real world multitask clustering datasets to evaluate the performance of S-MBC and S-MKC. The results on clustering accuracy show that: (1) compared with the original MBC algorithm MBC, S-MBC and S-MKC perform much better; (2) compared with the convex discriminative multitask relationship clustering (DMTRC) algorithms DMTRC-L and DMTRC-R which also avoid negative transfer, S-MBC and S-MKC perform worse in the (ideal) case in which different tasks have the same cluster number and the empirical label marginal distribution in each task distributes evenly, but better or comparable in other (more general) cases. Moreover, S-MBC and S-MKC can work on the datasets in which different tasks have different number of clusters, violating the assumptions of DMTRC-L and DMTRC-R. The results on efficiency show that S-MBC and S-MKC consume more computational time than MBC and less computational time than DMTRC-L and DMTRC-R. Overall S-MBC and S-MKC are competitive compared with the state-of-the-art multitask clustering algorithms in synthetical terms of accuracy, efficiency and applicability.
机译:传统的聚类算法处理单个数据集上的单个聚类任务。但是,现实世界中有许多相关任务,这些任务促使多任务集群化。最近提出了一些多任务聚类算法,其中多任务布雷格曼聚类(MBC)是一种非常适用的方法。 MBC可选择地更新集群并了解不同任务的集群之间的关系,并且这两个阶段相互促进。但是,增强并不总是对改善聚类性能有积极影响,也可能引起消极影响。 MBC的另一个问题是它不能处理非线性可分离数据。在本文中,我们表明在MBC中,使用群集关系来促进群集更新阶段的过程可能会产生负面影响,即,群集质心在某些情况下可能会偏斜。我们提出了一种智能多任务Bregman聚类(S-MBC)算法,该算法可以识别提升的负面影响,并避免出现负面影响。然后,我们通过在内核空间中使用类似于MBC的类似框架,为非线性可分离数据提出了一个多任务内核聚类(MKC)框架。我们还提出了一种与MBC完全不同的特定优化方法来实现MKC框架。由于MKC也会像MBC一样引起负面影响,因此我们将MKC的框架进一步扩展到智能多任务内核群集(S-MKC)框架,其方式类似于从MBC扩展S-MBC。我们对10个现实世界中的多任务聚类数据集进行了实验,以评估S-MBC和S-MKC的性能。聚类精度的结果表明:(1)与原始MBC算法MBC相比,S-MBC和S-MKC的性能要好得多; (2)与同样避免负迁移的凸判别式多任务关系聚类(DMTRC)算法DMTRC-L和DMTRC-R相比,S-MBC和S-MKC在不同任务具有相同特征的(理想)情况下表现较差每个任务中的簇数和经验标签边际分布均匀,但在其他(更一般)情况下更好或更可比。此外,S-MBC和S-MKC可以在不同任务具有不同簇数的数据集上工作,这违反了DMTRC-L和DMTRC-R的假设。效率结果表明,S-MBC和S-MKC比MBC消耗更多的计算时间,比DMTRC-L和DMTRC-R消耗更少的计算时间。与最新的多任务聚类算法相比,总体S-MBC和S-MKC在准确性,效率和适用性方面具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号