Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm

Hyrien Ollivier; Baran Andrea

首页> 外文期刊>Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America >Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm

【24h】

Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm

机译：基于随机近似均值漂移算法的大型数据集基于非参数密度的快速聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Mean-shift is an iterative procedure often used as a nonparametric clustering algorithm that defines clusters based on the modal regions of a density function. The algorithm is conceptually appealing and makes assumptions neither about the shape of the clusters nor about their number. However, with a complexity of O(n(2)) per iteration, it does not scale well to large datasets. We propose a novel algorithm which performs density-based clustering much quicker than mean shift, yet delivering virtually identical results. This algorithm combines subsampling and a stochastic approximation procedure to achieve a potential complexity of O(n) at each step. Its convergence is established. Its performances are evaluated using simulations and applications to image segmentation, where the algorithm was tens or hundreds of times faster than mean shift, yet causing negligible amounts of clustering errors. The algorithm can be combined with existing approaches to further accelerate clustering.

机译：均值平移是一种迭代过程，通常用作基于密度函数模态区域定义聚类的非参数聚类算法。该算法在概念上很吸引人，并且既不对群集的形状也不对群集的数量进行假设。但是，由于每次迭代的复杂度为O（n（2）），因此无法很好地扩展到大型数据集。我们提出了一种新颖的算法，该算法比基于均值平移的方法执行基于密度的聚类要快得多，而实际上却提供了相同的结果。该算法将子采样和随机逼近过程结合在一起，以在每个步骤上实现O（n）的潜在复杂性。建立其融合。使用模拟和图像分割应用评估了它的性能，该算法比平均漂移速度快几十或几百倍，但造成的聚类误差可忽略不计。该算法可以与现有方法结合以进一步加速聚类。

著录项

来源
《Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America》 |2016年第3期|共18页
作者
Hyrien Ollivier; Baran Andrea;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用统计数学;
关键词
Image segmentation; Large datasets; Robbins-Monro procedure;

机译：图像分割;大数据集;Robbins-Monro过程;

相似文献

外文文献
中文文献
专利

1. Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm [J] . Hyrien Ollivier, Baran Andrea Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2016,第3期

机译：基于随机近似均值漂移算法的大型数据集基于非参数密度的快速聚类
2. An efficient and scalable density-based Clustering algorithm for datasets with complex structures [J] . Lv Yinghua, Ma Tinghuai, Tang Meili, Neurocomputing . 2016,第JANa1期

机译：一种高效且可扩展的基于密度的聚类算法，用于复杂结构的数据集
3. Fast density-based clustering through dataset partition using graphics processing units [J] . Loh Woong-Kee, Yu Hwanjo Information Sciences: An International Journal . 2015,第Null期

机译：使用图形处理单元通过数据集分区实现基于密度的快速聚类
4. A Comparative Study of Two Density-Based Spatial Clustering Algorithms for Very Large Datasets [C] . Xin Wang, Howard J. Hamilton Conference of the Canadian Society for Computational Studies of Intelligence; 20050509-11; Victoria(CA) . 2005

机译：两种基于密度的超大数据集空间聚类算法的比较研究
5. Mean-shift algorithms for manifold denoising, matrix completion and clustering. [D] . Wang, Weiran. 2013

机译：用于流形降噪，矩阵完成和聚类的均值漂移算法。
6. Fast Nonparametric Density-Based Clustering of Large Data Sets Using a Stochastic Approximation Mean-Shift Algorithm [O] . Ollivier Hyrien, Andrea Baran -1

机译：使用随机逼近均值漂移算法的大型数据集基于非参数密度的快速聚类
7. Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm [O] . Ollivier Hyrien, Andrea Baran 2016

机译：使用随机近似平均换档算法的基于基于非参数的基于大数据集的基于基于非参数的聚类
8. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. [R] . Zhao, Y., Karypis, G. 2002

机译：文档数据集的层次聚类算法评估。

Fast Nonparametric Density-Based Clustering of Large Datasets Using a Stochastic Approximation Mean-Shift Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅