Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

Banerjee Arindam; Dhillon Inderjit S.; Ghosh Joydeep; Sra Suvrit

首页> 外文期刊>Journal of machine learning research >Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

【24h】

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

机译：使用von Mises-Fisher分布在单位超球面上聚类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Several large scale data mining applications, such as textcategorization and gene expression analysis, involve high-dimensionaldata that is also inherently directional in nature. Often such datais L₂ normalized so that it lies on the surface of aunit hypersphere. Popular models such as (mixtures of) multi-variateGaussians are inadequate for characterizing such data. This paperproposes a generative mixture-model approach to clustering directionaldata based on the von Mises-Fisher (vMF) distribution, which arisesnaturally for data distributed on the unit hypersphere. Inparticular, we derive and analyze two variants of the ExpectationMaximization (EM) framework for estimating the mean and concentrationparameters of this mixture. Numerical estimation of the concentrationparameters is non-trivial in high dimensions since it involvesfunctional inversion of ratios of Bessel functions. We also formulatetwo clustering algorithms corresponding to the variants of EM that wederive. Our approach provides a theoretical basis for the use ofcosine similarity that has been widely employed by the informationretrieval community, and obtains the spherical kmeans algorithm(kmeans with cosine similarity) as a special case of both variants.Empirical results on clustering of high-dimensional text andgene-expression data based on a mixture of vMF distributions show thatthe ability to estimate the concentration parameter for each vMFcomponent, which is not present in existing approaches, yieldssuperior results, especially for difficult clustering tasks inhigh-dimensional spaces. color="gray">

机译：一些大型数据挖掘应用程序，例如文本分类和基因表达分析，涉及到高维度数据，这些数据本质上也具有固有的方向性。通常将此类数据 L _{2 归一化，使其位于非单位超球面的表面。流行的模型（例如，多元高斯混合）不足以表征此类数据。本文提出了一种基于von Mises-Fisher（vMF）分布的方向性数据聚类的生成混合模型方法，该方法自然而然地出现在单位超球面上的数据分布中。特别是，我们推导并分析了ExpectationMaximization（EM）框架的两个变体，用于估计此混合物的平均值和浓度参数。浓度参数的数值估计在高维方面并非易事，因为它涉及贝塞尔函数比率的函数求逆。我们还制定了两种与衍生的EM变体相对应的聚类算法。我们的方法为信息检索社区广泛使用的余弦相似度的使用提供了理论基础，并获得了球形kmeans算法（具有余弦相似度的kmeans）作为这两种变体的特殊情况。高维文本聚类的经验结果基于vMF分布的混合数据和基因表达数据表明，估计每种vMF组分的浓度参数的能力（现有方法中不存在）产生了优异的结果，尤其是对于高维空间中的困难聚类任务而言。 color =“ gray “>}

著录项

来源
《Journal of machine learning research 》 |2005年第9期| 共38页
作者
Banerjee Arindam; Dhillon Inderjit S.; Ghosh Joydeep; Sra Suvrit;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. The von Mises-Fisher distribution of the first exit point from the hypersphere of the drifted Brownian motion and the density of the first exit time [J] . Gatto R. Statistics & Probability Letters . 2013 ,第7期

机译：漂移布朗运动超球面的第一个出口点的von Mises-Fisher分布以及第一个出口时间的密度
2. Clustering Directions Based on the Estimation of a Mixture of Von Mises-Fisher Distributions [J] . Adelaide Figueiredo The Open Statistics & Probability Journal . 2017 ,第1期

机译：基于估计Von Mises-Fisher分布的混合物的聚类方向
3. Clustering using EM and CEM, cluster number selection via the Von Mises-Fisher mixture models [J] . Wafia Parr Bouberima, Mohamed Nadif, Yamina Khemal Bencheikh International Journal of Open Problems in Computer Science and Mathematics . 2013 ,第1期

机译：使用EM和CEM进行聚类，通过Von Mises-Fisher混合模型选择聚类数
4. Von Mises-Fisher Mean Shift for Clustering on a Hypersphere [C] . Kobayashi Takumi, Otsu Nobuyuki 2010 20th International Conference on Pattern Recognition . 2010

机译：冯·米塞斯-费舍尔均值平移在超球面上的聚类
5. Covariance Modelling with Hypersphere Decomposition Method and Modified Hypersphere Decomposition Method [D] . Li, Qingze. 2018

机译：间接分解方法的协方差建模和改性极度分解方法
6. A parcellation scheme based on von Mises-Fisher distributions and Markov random fields for segmenting brain regions using resting-state fMRI [O] . Srikanth Ryali, Tianwen Chen, Kaustubh Supekar, -1

机译：基于Von Mises-fisher分布和Markov随机字段的局部计划使用休息状态FMRI分割大脑区域
7. Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams [O] . Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen 2018

机译：使用Von Mises-Fisher分布的混合物进行自然音频流的强大扬声器聚类

Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

摘要

著录项

相似文献

相关主题

期刊订阅