In this paper, we propose a new fuzzy clustering algorithm based on themode-seeking framework. Given a dataset in $mathbb{R}^d$, we define regions ofhigh density that we call cluster cores. We then consider a random walk on aneighborhood graph built on top of our data points which is designed to beattracted by high density regions. The strength of this attraction iscontrolled by a temperature parameter $eta > 0$. The membership of a point toa given cluster is then the probability for the random walk to hit thecorresponding cluster core before any other. While many properties of randomwalks (such as hitting times, commute distances, etcdots) have been shown toenventually encode purely local information when the number of data pointsgrows, we show that the regularization introduced by the use of cluster coressolves this issue. Empirically, we show how the choice of $eta$ influencesthe behavior of our algorithm: for small values of $eta$ the result is closeto hard mode-seeking whereas when $eta$ is close to $1$ the result is similarto the output of a (fuzzy) spectral clustering. Finally, we demonstrate thescalability of our approach by providing the fuzzy clustering of a proteinconfiguration dataset containing a million data points in $30$ dimensions.
展开▼