首页> 外文期刊>IEEE Transactions on Signal Processing >Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support
【24h】

Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support

机译:使用Wasserstein重心和稀疏支持的快速离散分布聚类

获取原文
获取原文并翻译 | 示例

摘要

In a variety of research areas, the weighted bag of vectors and the histogram are widely used descriptors for complex objects. Both can be expressed as discrete distributions. D2-clustering pursues the minimum total within-cluster variation for a set of discrete distributions subject to the Kantorovich–Wasserstein metric. D2-clustering has a severe scalability issue, the bottleneck being the computation of a centroid distribution, called Wasserstein barycenter, that minimizes its sum of squared distances to the cluster members. In this paper, we develop a modified Bregman ADMM approach for computing the approximate discrete Wasserstein barycenter of large clusters. In the case when the support points of the barycenters are unknown and have low cardinality, our method achieves high accuracy empirically at a much reduced computational cost. The strengths and weaknesses of our method and its alternatives are examined through experiments, and we recommend scenarios for their respective usage. Moreover, we develop both serial and parallelized versions of the algorithm. By experimenting with large-scale data, we demonstrate the computational efficiency of the new methods and investigate their convergence properties and numerical stability. The clustering results obtained on several datasets in different domains are highly competitive in comparison with some widely used methods in the corresponding areas.
机译:在各种研究领域中,向量的加权包和直方图是复杂对象的广泛使用的描述符。两者都可以表示为离散分布。对于遵循Kantorovich-Wasserstein指标的一组离散分布,D2聚类追求最小的总聚类内变化。 D2群集存在严重的可伸缩性问题,瓶颈在于计算质心分布(称为Wasserstein重心)的方法,该方法最大程度地减少了距群集成员的平方距离之和。在本文中,我们开发了一种改进的Bregman ADMM方法,用于计算大型星团的近似离散Wasserstein重心。在重心的支撑点未知且基数较低的情况下,我们的方法凭经验实现了高精度,而计算成本却大大降低了。通过实验检查了我们的方法及其替代方法的优缺点,并建议了各自使用的方案。此外,我们开发了该算法的串行和并行版本。通过大规模数据的实验,我们证明了新方法的计算效率,并研究了它们的收敛性和数值稳定性。与相应领域中一些广泛使用的方法相比,在不同领域的几个数据集上获得的聚类结果具有很高的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号