A Practical Algorithm for Spatial Agglomerative Clustering

机译：一种用于空间凝聚聚类的实用算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study an agglomerative clustering problem motivated by visualizing disjoint glyphs (represented by geometric shapes) centered at specific locations on a geographic map. As we zoom out, the glyphs grow and start to overlap. We replace overlapping glyphs by one larger merged glyph to maintain disjointness. Our goal is to compute the resulting hierarchical clustering efficiently in practice. A straightforward algorithm for such spatial agglomerative clustering runs in O (n~2 log n) time, where n is the number of glyphs. This is not efficient enough for many real-world datasets which contain up to tens or hundreds of thousands of glyphs. Recently the theoretical upper bound was improved to O (na (n) log~7 n) time, where a(n) is the extremely slow growing inverse Ackermann function. Although this new algorithm is asymptotically much faster than the naive algorithm, from a practical point of view, it does not perform better for n < 10~6. In this paper we present a new agglomerative clustering algorithm which works efficiently in practice. Our algorithm relies on the use of quadtrees to speed up spatial computations. Interestingly, even in non-pathological datasets we can encounter large glyphs that intersect many quadtree cells and that are involved in many clustering events. We therefore devise a special strategy to handle such large glyphs. We test our algorithm on several synthetic and real-world datasets and show that it performs well in practice.

机译：我们研究了通过在地理图上的特定位置以特定位置为中心的不相交的字形（由几何形状表示的差异字形（表示）激励的凝聚聚类问题。当我们缩小时，字形成长并开始重叠。我们通过一个更大合并的字形替换重叠的字形来维持差异。我们的目标是在实践中有效地计算结果的分层聚类。用于这种空间聚类聚类的直接算法在O（n〜2 log n）的时间内运行，其中n是字形的数量。这对于许多包含最多数十或数十万个字形的真实世界数据集来说并不有效。最近，理论上限得到改善为O（na（n）log〜7 n）时间，其中a（n）是极其缓慢的生长逆Ackermann函数。虽然这种新算法渐近地比天真算法快得多，但从实际的角度来看，它不会对N <10〜6表现更好。在本文中，我们提出了一种新的聚类聚类算法，其实践有效。我们的算法依赖于使用四分表来加速空间计算。有趣的是，即使在非病理数据集中，我们也可以遇到与许多四叉细胞相交的大型字形，并且涉及许多聚类事件。因此，我们制定了一个特殊的战略来处理这种大字形。我们在几个合成和现实世界数据集中测试我们的算法，并显示它在实践中表现得很好。

著录项

来源
《Workshop on Algorithm Engineering and Experiments》|2019年|225p|共12页
会议地点
作者
Thom Castermans; Bettina Speckmann; Kevin Verbeek;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词

相似文献

外文文献
中文文献
专利

1. An improved frequency based agglomerative clustering algorithm for detecting distinct clusters on two dimensional dataset [J] . Madheswaran M., Sreedhar Kumar S. Journal of Engineering and Technology Research . 2017,第4期

机译：一种改进的基于频率的聚集聚类算法，用于检测二维数据集上的不同聚类
2. Clustering Based Multi Flip Flop Merging Using Agglomerative Clustering Algorithm [J] . T.Rathidevi, R.Premkumar International Journal of Innovative Research in Science, Engineering and Technology . 2014,第3期

机译：基于聚集聚类算法的聚类多触发器合并
3. Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters [J] . Li M.J., Ng M.K., Cheung Y.-m., IEEE Transactions on Knowledge and Data Engineering . 2008,第11期

机译：选择簇数的聚集模糊K-均值聚类算法
4. A Practical Algorithm for Spatial Agglomerative Clustering [C] . Thom Castermans, Bettina Speckmann, Kevin Verbeek Workshop on Algorithm Engineering and Experiments . 2019

机译：一种用于空间凝聚聚类的实用算法
5. Efficient Algorithms for Hierarchical Agglomerative Clustering. [D] . Anandan, Ajay. 2013

机译：分层聚集聚类的高效算法。
6. AGNEP: An Agglomerative Nesting Clustering Algorithm for Phenotypic Dimension Reduction in Joint Analysis of Multiple Phenotypes [O] . Fengrong Liu, Ziyang Zhou, Mingzhi Cai, 2021

机译：Agnep：一种粘结性嵌套聚类算法用于多种表型联合分析的表型尺寸降低
7. A Practical Algorithm for Spatial Agglomerative Clustering [O] . Thom Castermans, Bettina Speckman, Kevin Verbeek 2019

机译：一种用于空间凝聚聚类的实用算法
8. Comparison of Agglomerative and Partitional Document Clustering Algorithms. [R] . Zhao, Y., Karypis, G. 2002

机译：凝聚与分区文档聚类算法的比较。

A Practical Algorithm for Spatial Agglomerative Clustering

摘要

著录项

相似文献

相关主题

期刊订阅