Robust $k$-means++

Amit Deshpande; Praneeth Kacham; Rameshwar Pratap

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Robust $k$-means++

【24h】

Robust $k$-means++

机译：鲁棒$ k $ -means ++

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A good seeding or initialization of cluster centers for the $k$-means method is important from both theoretical and practical standpoints. The $k$-means objective is inherently non-robust and sensitive to outliers. A popular seeding such as the $k$-means++ [3] that is more likely to pick outliers in the worst case may compound this drawback, thereby affecting the quality of clustering on noisy data.For any $0 < delta leq 1$, we show that using a mixture of $D^{2}$?[3] and uniform sampling, we can pick $O(k/delta)$ candidate centers with the following guarantee: they contain some $k$ centers that give $O(1)$-approximation to the optimal robust $k$-means solution while discarding at most $delta n$ more points than the outliers discarded by the optimal solution. That is, if the optimal solution discards its farthest $eta n$ points as outliers, our solution discards its $(eta + delta) n$ points as outliers. The constant factor in our $O(1)$-approximation does not depend on $delta$. This is an improvement over previous results for $k$-means with outliers based on LP relaxation and rounding [7] and local search [17]. The $O(k/delta)$ sized subset can be found in time $O(ndk)$. Our emph{robust} $k$-means++ is also easily amenable to scalable, faster, parallel implementations of $k$-means++ [5]. Our empirical results show a comparison of the above emph{robust} variant of $k$-means++ with the usual $k$-means++, uniform random seeding, threshold $k$-means++?[6] and local search on real world and synthetic data.

机译：k $ -means方法的良好播种或初始化为$ k $ -means方法是从理论和实际的角度来看重要的。 $ k $ -means目标对异常值本质上是不稳健的和敏感的。一种流行的种子，如$ k $ -means ++ [3]，这些播种者更有可能在最坏情况下选择异常值可能会复制这种缺点，从而影响嘈杂数据上的聚类质量。对于任何$ 0 < delta LEQ 1 $ ，我们展示了使用$ d ^ {2} $的混合物？[3]和统一的采样，我们可以选择$ o（k / delta）$候选中心，其中包含以下担保：它们包含一些$ k $中心提供$ O（1）$ - 近似为最佳稳健$ k $ -means解决方案，同时丢弃大多数$ delta n $的点，而不是最佳解决方案丢弃的异常值。也就是说，如果最佳解决方案丢弃其最远的$ Beta n $积分作为异常值，我们的解决方案将丢弃其$（ beta + delta）n $积分作为异常值。我们的$ O（1）$ - 近似的恒定因素不依赖于$ delta $。这是对以k $的先前结果的改进 - 基于LP松弛和舍入[7]和本地搜索[17]的异常值。 $ O（k / delta）$大小的子集可以及时找到$ o（ndk）$。我们的 emph {rubust} $ k $ -means ++也可以轻松实现可扩展，更快，并行实现$ k $ -means ++ [5]。我们的经验结果显示了上述 k $ -means ++的上述 memph {rubust} variant的比较，常用$ k $ -means ++，统一的随机播种，阈值$ k $ -means ++？[6]和现场搜索和合成数据。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共10页
作者
Amit Deshpande; Praneeth Kacham; Rameshwar Pratap;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. User Value Identification Based on Improved RFM Model and -Means++ Algorithm for Complex Data Analysis [J] . Jun Wu, Li Shi, Liping Yang, Wireless communications & mobile computing . 2021,第a期

机译：基于改进的RFM模型和模拟复杂数据分析++算法的用户价值识别
2. A network science-based k-means++ clustering method for power systems network equivalence [J] . Dhruv Sharma, Krishnaiya Thulasiraman, Di Wu, Computational Social Networks . 2019,第1期

机译：基于网络科学的电力系统网络等效性 k -means ++聚类方法
3. Cuckoo and krill herd‐based k‐means++ hybrid algorithms for　clustering [J] . Aggarwal Shruti, Singh Paramvir Expert Systems . 2019,第4期

机译：基于杜鹃和磷虾群的k-means ++混合算法
4. A Constant-Factor Bi-Criteria Approximation Guarantee for κ -means++ [C] . Dennis Wei Annual conference on Neural Information Processing Systems . 2016

机译：κ-均值++的恒因子双准则逼近保证
5. Adversarial Robustness and Robust Meta-Learning for Neural Networks [D] . Goldblum, Micah. 2020

机译：对神经网络的对抗鲁棒性和强大的元学习
6. A Unifying Mathematical Framework for Genetic Robustness Environmental Robustness Network Robustness and their Trade-off on Phenotype Robustness in Biological Networks Part I: Gene Regulatory Networks in Systems and Evolutionary Biology [O] . Bor-Sen Chen, Ying-Po Lin 2013

机译：遗传稳健性环境稳健性网络稳健性及其在生物网络中表型稳健性之间的权衡的统一数学框架第一部分：系统和进化生物学中的基因调控网络
7. User Value Identification Based on Improved RFM Model and K -Means++ Algorithm for Complex Data Analysis [O] . Jun Wu, Li Shi, Liping Yang, 2021

机译：基于改进RFM模型的用户价值识别和k -means ++算法复杂数据分析

Robust $k$-means++

摘要

著录项

相似文献

相关主题

期刊订阅