A scalable framework for cluster ensembles

Hore P; Hall LO; Goldgof DB

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >A scalable framework for cluster ensembles

【24h】

A scalable framework for cluster ensembles

机译：用于集群集成的可扩展框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.

机译：出于多种原因，可能会生成一组群集解决方案或分区。如果数据集非常大，则可以在易于处理的大小不相交的子集上进行聚类。数据可以分布在不同的站点，对于这些站点而言，具有分区最终合并的分布式集群解决方案自然是合适的。在本文中，介绍了两种以群集中心集为代表的组合分区的新方法。这些方法的优势在于，它们提供了与最佳现有方法相当的最终数据分区，但可以扩展到非常大的数据集。它们可以使用更少的内存，快100,000倍。将新算法与现有的最佳集群集成方法进行了比较，可以对所有数据进行一次聚类，并且可以为大型数据集设计聚类算法。对基于模糊和Hard-k-means的聚类算法进行了比较。结果表明，此处介绍的基于质心的集成合并算法可生成质量与最佳标签向量方法相当的分区，或一次将所有数据聚类，同时提供非常大的加速比。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2009年第5期|共13页
作者
Hore P; Hall LO; Goldgof DB;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Clustering; Hard/fuzzy-k-means; Large data sets; Ensemble; Scalability; Single pass algorithm;

机译：聚类;硬/模糊k均值;大数据集;集合;可伸缩性;单遍算法;

相似文献

外文文献
中文文献
专利

1. A scalable framework for cluster ensembles [J] . Hore P, Hall LO, Goldgof DB Pattern Recognition: The Journal of the Pattern Recognition Society . 2009,第5期

机译：用于集群集成的可扩展框架
2. A Clustering-Oriented Closeness Measure Based on Neighborhood Chain and Its Application in the Clustering Ensemble Framework Based on the Fusion of Different Closeness Measures [J] . Shaoyi Liang, Deqiang Han Sensors . 2017,第10期

机译：基于邻域链的聚类贴近度度量及其在融合不同贴近度度量的聚类集成框架中的应用
3. A clustering ensemble framework based on elite selection of weighted clusters [J] . Parvin H., Minaei-Bidgoli B. Advances in data analysis and classification . 2013,第2期

机译：基于加权聚类精英选择的聚类集成框架
4. Tumor Clustering based on Hybrid Cluster Ensemble Framework [C] . Zhiwen Yu, Jane You, Hantao Chen, International Conference on Computerized Healthcare . 2013

机译：基于混合簇集合框架的肿瘤聚类
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. A Scalable Framework For Cluster Ensembles [O] . Prodip Hore, Lawrence O. Hall, Dmitry B. Goldgof -1

机译：集群集合的可扩展框架
7. A scalable framework for cluster ensembles [O] . Prodip Hore, Lawrence O. Hall, Dmitry B. Goldgof 2009

机译：集群集合的可扩展框架

A scalable framework for cluster ensembles

摘要

著录项

相似文献

相关主题

期刊订阅