...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Fast Noise Removal for k-Means Clustering
【24h】

Fast Noise Removal for k-Means Clustering

机译:K-Means聚类的快速噪声拆卸

获取原文
   

获取外文期刊封面封底 >>

       

摘要

This paper considers k-means clustering in the presence of noise. It is known that k-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is called k-means clustering with outliers. The goal of k-means clustering with outliers is to discard up to a specified number z of points as noise/outliers and then find a k-means solution on the remaining data. The problem has received significant attention, yet current algorithms with theoretical guarantees suffer from either high running time or inherent loss in the solution quality. The main contribution of this paper is two-fold. Firstly, we develop a simple greedy algorithm that has provably strong worst case guarantees. The greedy algorithm adds a simple preprocessing step to remove noise, which can be combined with any k-means clustering algorithm. This algorithm gives the first pseudo-approximation-preserving reduction from k-means with outliers to k-means without outliers. Secondly, we show how to construct a coreset of size O(k log n). When combined with our greedy algorithm, we obtain a scalable, near linear time algorithm. The theoretical contributions are verified experimentally by demonstrating that the algorithm quickly removes noise and obtains a high-quality clustering.
机译:本文认为k-means集群在存在噪音。众所周知,K-Means聚类对噪声非常敏感,因此应省去噪声以获得质量解决方案。流行的这个问题的制定称为K-means与异常值聚类。 K-means与异常值聚类的目标是丢弃高达指定的点数点作为噪声/异常值,然后在剩余数据上找到K-means解决方案。问题已得到重大关注,但目前具有理论保证的当前算法遭受了高运行时间或解决方案质量的固有损失。本文的主要贡献是两倍。首先,我们开发了一种简单的贪婪算法,这些算法已经过度了最严重的最坏情况。贪婪算法添加了一个简单的预处理步骤来消除噪声,可以与任何K-means聚类算法组合。该算法通过在没有异常值的情况下,从k-mility的k-milith提供了从k-milit的第一近似近似减少。其次,我们展示了如何构建尺寸O(k log n)的刻度。结合我们的贪婪算法时,我们获得可扩展的近线时间算法。通过证明算法快速消除噪声并获得高质量聚类,通过实验验证理论贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号