首页> 外文会议>Workshop on Algorithm Engineering and Experiments >A Practical Algorithm for Spatial Agglomerative Clustering
【24h】

A Practical Algorithm for Spatial Agglomerative Clustering

机译:一种用于空间凝聚聚类的实用算法

获取原文

摘要

We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in O (n~2 log n) time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to O (na (n) log~7 n) time, where a(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n < 10~6. In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.
机译:我们研究了通过在地理图上的特定位置以特定位置为中心的不相交的字形(由几何形状表示的差异字形(表示)激励的凝聚聚类问题。当我们缩小时,字形成长并开始重叠。我们通过一个更大合并的字形替换重叠的字形来维持差异。我们的目标是在实践中有效地计算结果的分层聚类。用于这种空间聚类聚类的直接算法在O(n〜2 log n)的时间内运行,其中n是字形的数量。这对于许多包含最多数十或数十万个字形的真实世界数据集来说并不有效。最近,理论上限得到改善为O(na(n)log〜7 n)时间,其中a(n)是极其缓慢的生长逆Ackermann函数。虽然这种新算法渐近地比天真算法快得多,但从实际的角度来看,它不会对N <10〜6表现更好。在本文中,我们提出了一种新的聚类聚类算法,其实践有效。我们的算法依赖于使用四分表来加速空间计算。有趣的是,即使在非病理数据集中,我们也可以遇到与许多四叉细胞相交的大型字形,并且涉及许多聚类事件。因此,我们制定了一个特殊的战略来处理这种大字形。我们在几个合成和现实世界数据集中测试我们的算法,并显示它在实践中表现得很好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号