...
首页> 外文期刊>Pattern recognition letters >Speeding-up the kernel fc-means clustering method: A prototype based hybrid approach
【24h】

Speeding-up the kernel fc-means clustering method: A prototype based hybrid approach

机译:加快内核fc-means聚类方法:基于原型的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

Kernel k-means clustering method has been proved to be effective in identifying non-isotropic and linearly inseparable clusters in the input space. However, this method is not a suitable one for large datasets because of its quadratic time complexity with respect to the size of the dataset. This paper presents a simple prototype based hybrid approach to speed-up the kernel fc-means clustering method for large data-sets. The proposed method works in two stages. First, the dataset is partitioned into a number of small grouplets by using the leaders clustering method which takes the size of each grouplet, called the threshold r, as an input parameter. The conventional leaders clustering method is modified such that these grouplets are formed in the kernel induced feature space, but each grouplet is represented by a pattern (called its leader) in the input space. The dataset is re-indexed according to these grouplets. Later, the kernel fc-means clustering method is applied over the set of leaders to derive a partition of the leaders set. Finally, each leader is replaced by its group to get a partition of the entire dataset. The time complexity as well as space complexity of the proposed method is O(n + p~2), where p is the number of leaders. The overall running time and the quality of the clustering result depends on the threshold t and the order in which the dataset is scanned. This paper presents a study on how the input parameter r affects the overall running time and the clustering quality obtained by the proposed method. Further, both theoretically and experimentally it has been shown how the order of scanning of the dataset affects the clustering result. The proposed method is also compared with the other recent methods that are proposed to speed-up the kernel k-means clustering method. Experimental study with several real world as well as synthetic data-sets shows that, for an appropriate value of t, the proposed method can significantly reduce the computation time but with a small loss in clustering quality, particularly for large datasets.
机译:核k均值聚类方法已被证明可有效识别输入空间中的各向同性和线性不可分的聚类。但是,此方法不适用于大型数据集,因为它相对于数据集的大小而言具有二次时间复杂性。本文提出了一种基于原型的简单混合方法,可加快针对大型数据集的内核fc-means聚类方法。所提出的方法分两个阶段工作。首先,通过使用领导者聚类方法将数据集划分为多个小组,该方法将每个组的大小(称为阈值r)作为输入参数。对传统的领导者聚类方法进行了修改,使得这些小类在内核诱导的特征空间中形成,但是每个小类都由输入空间中的模式(称为其领导者)表示。数据集根据这些组重新索引。后来,将内核fc-means聚类方法应用于领导者集合以导出领导者集合的分区。最后,将每个领导者替换为其组,以获取整个数据集的分区。所提方法的时间复杂度和空间复杂度为O(n + p〜2),其中p为首标的数量。总体运行时间和聚类结果的质量取决于阈值t和数据集的扫描顺序。本文研究了输入参数r如何影响整体运行时间和通过该方法获得的聚类质量。此外,从理论上和实验上都显示了数据集的扫描顺序如何影响聚类结果。将该方法与其他最近提出的可加快内核k均值聚类方法的方法进行了比较。对多个真实世界以及合成数据集的实验研究表明,对于适当的t值,该方法可以显着减少计算时间,但聚类质量的损失很小,尤其是对于大型数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号