...
首页> 外文期刊>Neural computing & applications >Rethinkingk-means clustering in the age of massive datasets: a constant-time approach
【24h】

Rethinkingk-means clustering in the age of massive datasets: a constant-time approach

机译:Rethinkingk-mears在大规模数据集时代的聚类:恒定时间方法

获取原文
获取原文并翻译 | 示例
           

摘要

We introduce a highly efficientk-means clustering approach. We show that the classical central limit theorem addresses a special case (k = 1) of thek-means problem and then extend it to the general case. Instead of using the full dataset, our algorithm namedk-means-lite applies the standardk-means to the combinationC(sizenk) of all sample centroids obtained fromnindependent small samples. Unlike ordinary uniform sampling, the approach asymptotically preserves the performance of the original algorithm. In our experiments with a wide range of synthetic and real-world datasets,k-means-lite matches the performance ofk-means whenCis constructed using 30 samples of size 40 + 2k. Although the 30-sample choice proves to be a generally reliable rule, when the proposed approach is used to scalek-means++ (we call this scaled versionk-means-lite++),k-means++' performance is matched in several cases, using only five samples. These two new algorithms are presented to demonstrate the proposed approach, but the approach can be applied to create a constant-time version of any otherk-means clustering algorithm, since it does not modify the internal workings of the base algorithm.
机译:我们介绍了一种高效的绩效聚类方法。我们表明经典的中央极限定理解决了THE-METION问题的特殊情况(k = 1),然后将其扩展到常规情况。我们的算法Namedk-Meange-Lite而不是使用完整的数据集,而是将标准克算法应用于从Nindependent小样本获得的所有样品质心的COMPORDC(Sizenk)。与普通均匀采样不同,该方法渐近地保留了原始算法的性能。在我们的实验中具有广泛的合成和现实世界数据集,K-Means-Lite匹配使用30尺寸40 + 2K样本构造的k-meast的性能。虽然30-SIMPLE选择被证明是一项通常可靠的规则,但是当建议的方法用于Scalsk-mease ++时(我们称之为缩放的VersionK-Measli-Lite ++),k-means ++'性能在几个情况下匹配,仅使用五个样品。提出了这两个新算法以演示所提出的方法,但是可以应用该方法来创建任何其他kse族聚类算法的恒定时间版本,因为它不修改基本算法的内部工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号