首页> 外文会议>International Neural Network Society Conference on Big Data >A-BIRCH: Automatic Threshold Estimation for the BIRCH Clustering Algorithm
【24h】

A-BIRCH: Automatic Threshold Estimation for the BIRCH Clustering Algorithm

机译:A-BIRCH:BIRCH聚类算法的自动阈值估计

获取原文

摘要

Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms do not scale well with increasing dataset sizes and require proper parametrization for correct results. In this paper we present A-BIRCH, an approach for automatic threshold estimation for the BIRCH clustering algorithm using Gap Statistic. This approach renders the global clustering step of BIRCH unnecessary and does not require knowledge on the expected number of clusters beforehand. This is achieved by analyzing a small representative subset of the data to extract attributes such as the cluster radius and the minimal cluster distance. These attributes are then used to compute a threshold that results, with high probability, in the correct clustering of elements. For the analysis of the representative subset we parallelized Gap Statistic to improve performance and ensure scalability.
机译:群集算法最近通过大型数据集的可用性和并行化计算架构的兴起来重新启发注意力。 但是,大多数聚类算法不会越来越好,随着数据集大小,并且需要适当的参数化以进行正确的结果。 在本文中,我们呈现A-BIRCH,使用间隙统计来实现桦木聚类算法的自动阈值估计方法。 这种方法使桦木的全球聚类步骤呈现不必要的,并且不需要预先对预期的群集数量的知识。 这是通过分析数据的小代表性子集来实现,以提取诸如簇半径和最小簇距离的属性。 然后使用这些属性来计算在对元素的正确聚类中具有高概率的阈值。 为了分析代表子集,我们并行化缺口统计数据以提高性能并确保可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号