首页> 外文会议>Annual European symposium on algorithms >BICO: BIRCH Meets Coresets for k-Means Clustering

【24h】

BICO: BIRCH Meets Coresets for k-Means Clustering

机译：BICO：BIRCH满足k-Means聚类的核心集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We design a data stream algorithm for the k-means problem, called BICO, that combines the data structure of the SIGMOD Test of Time award winning algorithm BIRCH with the theoretical concept of coresets for clustering problems. The k-means problem asks for a set C of k centers minimizing the sum of the squared distances from every point in a set P to its nearest center in C. In a data stream, the points arrive one by one in arbitrary order and there is limited storage space. BICO computes high quality solutions in a time short in practice. First, BICO computes a summary S of the data with a provable quality guarantee: For every center set C, S has the same cost as P up to a (1 + ε)-factor, i. e., S is a coreset. Then, it runs k-means++ on S. We compare BICO experimentally with popular and very fast heuristics (BIRCH, MacQueen [24]) and with approximation algorithms (Stream-KM++ , StreamLS ) with the best known quality guarantees. We achieve the same quality as the approximation algorithms mentioned with a much shorter running time, and we get much better solutions than the heuristics at the cost of only a moderate increase in running time.

机译：我们设计了一种针对k均值问题的数据流算法，称为BICO，该算法将SIGMOD时间测试获奖算法BIRCH的数据结构与用于聚类问题的核心集的理论概念相结合。 k均值问题要求由k个中心组成的集合C最小化从集合P中的每个点到它在C中最近的中心的平方距离之和。在数据流中，这些点以任意顺序一个到一个地到达是有限的存储空间。在实践中，BICO可以在短时间内计算出高质量的解决方案。首先，BICO计算出具有可证明质量保证的数据摘要S：对于每个中心集C，S的成本与P相同，但高达（1 +ε）因子，即。例如，S是一个核心集。然后，它在S上运行k-means ++。我们通过实验将BICO与流行且非常快的启发式算法（BIRCH，MacQueen [24]）和近似算法（Stream-KM ++，StreamLS）进行比较，该算法具有最著名的质量保证。我们以更短的运行时间实现了与上述近似算法相同的质量，并且以仅适度增加运行时间为代价，获得了比启发式方法更好的解决方案。

著录项

来源
《Annual European symposium on algorithms 》|2013年|481-492|共12页
会议地点
作者
Hendrik Fichtenberger; Marc Gille; Melanie Schmidt; Chris Schwiegelshohn; Christian Sohler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. TURNING BIG DATA INTO TINY DATA: CONSTANT-SIZE CORESETS FOR k-MEANS, PCA, AND PROJECTIVE CLUSTERING [J] . Feldman Dan, Schmidt Melanie, Sohler Christian SIAM Journal on Computing . 2020 ,第3期

机译：将大数据转化为微小数据：K-Means，PCA和投影群集的常量尺寸导致
2. Scalable k-Means Clustering via Lightweight Coresets [J] . Olivier Bachem, Mario Lucic, Andreas Krause SIGKDD explorations . 2018 ,第Udisk期

机译：可扩展的K-meary通过轻量级冠状群体聚类
3. ON CORESETS FOR k-MEDIAN AND k-MEANS CLUSTERING IN METRIC AND EUCLIDEAN SPACES AND THEIR APPLICATIONS [J] . Chen K SIAM Journal on Computing . 2010 ,第3期

机译：度和欧空间中k均值和k均值聚类的共预设及其应用
4. BICO: BIRCH Meets Coresets for k-Means Clustering [C] . Hendrik Fichtenberger, Marc Gille, Melanie Schmidt, Workshop on Algorithms in Bioinformatics . 2013

机译：BICO：桦木满足K-Means聚类的冠状困境
5. Automated Parsing of Flexible Molecular Systems Using Principal Component Analysis and K-Means Clustering Techniques [D] . Nwerem, Matthew Jonathan Chukwunenye. 2021

机译：使用主成分分析和K-Means聚类技术自动解析灵活分子系统
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Coresets and streaming algorithms for the k-means problem and related clustering objectives [O] . Schmidt Melanie 2014

机译：用于k均值问题和相关聚类目标的核集和流算法

BICO: BIRCH Meets Coresets for k-Means Clustering

摘要

著录项

相似文献

相关主题

期刊订阅