...
【24h】

A scalable framework for cluster ensembles

机译:用于集群集成的可扩展框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.
机译:出于多种原因,可能会生成一组群集解决方案或分区。如果数据集非常大,则可以在易于处理的大小不相交的子集上进行聚类。数据可以分布在不同的站点,对于这些站点而言,具有分区最终合并的分布式集群解决方案自然是合适的。在本文中,介绍了两种以群集中心集为代表的组合分区的新方法。这些方法的优势在于,它们提供了与最佳现有方法相当的最终数据分区,但可以扩展到非常大的数据集。它们可以使用更少的内存,快100,000倍。将新算法与现有的最佳集群集成方法进行了比较,可以对所有数据进行一次聚类,并且可以为大型数据集设计聚类算法。对基于模糊和Hard-k-means的聚类算法进行了比较。结果表明,此处介绍的基于质心的集成合并算法可生成质量与最佳标签向量方法相当的分区,或一次将所有数据聚类,同时提供非常大的加速比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号