首页> 外文OA文献 >Accelerating Exact k-means Algorithms with Geometric Reasoning
【2h】

Accelerating Exact k-means Algorithms with Geometric Reasoning

机译:利用几何推理加速精确的k均值算法

摘要

We present new algorithms for the k-means clustering problem. They use the kd-tree data structure to reduce the large number of nearest-neighbor queries issued by the traditional algorithm. Sufficient statistics are stored in the nodes of the kd-tree. Then, an analysis of the geometry of the current cluster centers results in great reduction of the work needed to update the centers. Our algorithms behave exactly as the traditional k-means algorithm. Proofs of correctness are included. The kd-tree can also be used to initialize the k-means starting centers efficiently. Our algorithms can be easily extended to provide fast ways of computing the error of a given cluster assignment, regardless of the method in which those clusters were obtained. We also show how to use them in a setting which allows approximate clustering results, with the benefit of running faster. We have implemented and tested our algorithms on both real and simulated data. Results show a speedup factor of up to 170 on real astrophysical data, and superiority over the naive algorithm on simulated data in up to 5 dimensions. Our algorithms scale well with respect to the number of points and number of centers, allowing for clustering with tens of thousands of centers.
机译:我们提出了针对k均值聚类问题的新算法。他们使用kd-tree数据结构来减少传统算法发出的大量最近邻居查询。足够的统计信息存储在kd-tree的节点中。然后,对当前群集中心的几何形状进行分析可以大大减少更新中心所需的工作。我们的算法与传统的k均值算法完全一样。包括正确性证明。 kd树还可以用于有效地初始化k均值起始中心。我们的算法可以轻松扩展,以提供快速的方法来计算给定群集分配的误差,而与获得这些群集的方法无关。我们还将展示如何在允许近似聚类结果的设置中使用它们,并具有运行速度更快的好处。我们已经在真实和模拟数据上实现并测试了我们的算法。结果显示,实际天体数据的加速因子高达170,在多达5个维度的模拟数据上,其优于朴素算法的优势。我们的算法在点数和中心数方面可以很好地扩展,从而可以与成千上万个中心进行聚类。

著录项

  • 作者

    Pelleg Dan; Moore Andrew;

  • 作者单位
  • 年度 1999
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号