Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time

Dan Kushnir; Shirin Jalali; Iraj Saniee

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time

【24h】

Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time

机译：在线性运行时间内实现高维高斯混合云的聚类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering mixtures of Gaussian distributions is a fundamental and challenging problem. State-of-the-art theoretical work on learning Gaussian mixture models has mostly focused on estimating the mixture parameters, where clustering is given as a byproduct. These methods have focused mostly on improving separation bounds for different mixture classes, and doing so in polynomial time and sample complexity. Less emphasis has been given to aligning these algorithms to the challenges of big data. In this paper, we focus on clustering $n$ samples from an arbitrary mixture of $c$-separated Gaussians in $mathbb{R}^p$ in time that is linear in $p$ and $n$, and sample complexity that is independent of $p$. Our analysis suggests that for sufficiently separated Gaussians after $o(log{p})$ random projections a good direction is found that yields a small clustering error. Specifically, for a user-specified error $e$, the expected number of such projections is small and bounded by $o(ln p)$ when $gammaleq csqrt{ln{ln{p}}}$ and $gamma=Q^{-1}(e)$ is the separation of the Gaussians with $Q$ as the tail distribution function of the normal distribution. Consequently, the expected overall running time of the algorithm is linear in $n$ and quasi-linear in $p$ at $o(ln{p})O(np)$, and the sample complexity is independent of $p$. Unlike the methods that are based on $k$-means, our analysis is applicable to any mixture class (spherical or non-spherical). Finally, an extension to $k>2$ components is also provided.

机译：高斯分布的混合聚类是一个基本且具有挑战性的问题。有关学习高斯混合模型的最新理论工作主要集中在估计混合参数上，其中聚类作为副产品给出。这些方法主要集中在改进不同混合物类别的分离界限上，并在多项式时间和样品复杂度方面做到这一点。很少强调使这些算法适应大数据的挑战。在本文中，我们集中于将来自$ mathbb {R} ^ p $中由$ c $分隔的高斯分布的任意混合中的$ n $样本聚类，并且在$ p $和$ n $中呈线性，并且样本复杂度高独立于$ p $。我们的分析表明，对于在$ o（ log {p}）$个随机投影之后经过充分分离的高斯，可以找到一个好的方向，该方向会产生较小的聚类误差。具体来说，对于用户指定的错误$ e $，当$ gamma leq c sqrt { ln { ln {p}}时，此类投影的预期数量较小，并受$ o（ ln p）$限制} $和$ gamma = Q ^ {-1}（e）$是高斯项的分离，其中$ Q $是正态分布的尾部分布函数。因此，在$ o（ ln {p}）O（np）$时，算法的预期总运行时间以$ n $为线性，以$ p $为准线性，并且样本复杂度与$ p $独立。与基于$ k $ -means的方法不同，我们的分析适用于任何混合类别（球形或非球形）。最后，还提供了对$ k> 2 $组件的扩展。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2009期|共9页
作者
Dan Kushnir; Shirin Jalali; Iraj Saniee;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Regularized Gaussian Mixture Model for High-Dimensional Clustering [J] . Yang Zhao, Abhishek K. Shrivastava, Kwok Leung Tsui Cybernetics, IEEE Transactions on . 2019,第10期

机译：高维聚类的正则化高斯混合模型
2. CHIME: CLUSTERING OF HIGH-DIMENSIONAL GAUSSIAN MIXTURES WITH EM ALGORITHM AND ITS OPTIMALITY [J] . Cai T. Tony, Ma Jing, Zhang Linjun The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2019,第3期

机译：Chime：使用EM算法的高维高斯混合物的聚类及其最优性
3. Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping [J] . Chemical geology . 2019,第期

机译：高斯混合Copulas用于高维聚类和基于依赖性的亚型
4. Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering [C] . Thibault Lesieur, Caterina de Bacco, Jess Banks, Annual Allerton Conference on Communication, Control, and Computing . 2016

机译：高维高斯混合聚类中的相变和最优算法
5. Cloud Versus Bare Metal a Comparison of a High Performance Computing Cluster Running in a Commercial Cloud and on a Traditional Hardware Cluster Using OpenMP and OpenMPI [D] . Bilaniuk, Vicky 2019

机译：云与裸机的比较-使用OpenMP和OpenMPI在商业云和传统硬件集群上运行的高性能计算集群的比较
6. Real-Time EEG Signal Enhancement Using Canonical Correlation Analysis and Gaussian Mixture Clustering [O] . Chin-Teng Lin, Chih-Sheng Huang, Wen-Yu Yang, 2018

机译：典型相关分析和高斯混合聚类的实时脑电信号增强
7. Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering [O] . Lesieur, Thibault, De Bacco, Caterina, Banks, Jess, 2016

机译：高维高斯混合聚类中的相变和优化算法

Towards Clustering High-dimensional Gaussian Mixture Clouds in Linear Running Time

摘要

著录项

相似文献

相关主题

期刊订阅