Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia

机译：使用Julia的Dirichlet过程混合模型中的分布式MCMC推断

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the increasing availability of large data sets, the need for general-purpose massively-parallel analysis tools become ever greater. In unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet-Process Mixture Model (DPMM), provide a principled Bayesian approach to adapt model complexity to the data. Despite their potential, however, DPMMs have yet to become a popular tool. This is partly due to the lack of friendly software tools that can handle large datasets efficiently. Here we show how, using Julia, one can achieve efficient and easily-modifiable implementation of distributed inference in DPMMs. Particularly, we show how a recent parallel MCMC inference algorithm - originally implemented in C++ for a single multi-core machine - can be distributed efficiently across multiple multi-core machines using a distributed-memory model. This leads to speedups, alleviates memory and storage limitations, and lets us learn DPMMs from significantly larger datasets and of higher dimensionality. It also turned out that even on a single machine the proposed Julia implementation handles higher dimensions more gracefully (at least for Gaussians) than the original C++ implementation. Finally, we use the proposed implementation to learn a model of image patches and apply the learned model for image denoising. While we speculate that a highly-optimized distributed implementation in, say, C++ could have been faster than the proposed implementation in Julia, from our perspective as machine-learning researchers (as opposed to HPC researchers), the latter also offers a practical and monetary value due to the ease of development and abstraction level. Our code is publicly available at https://github.com/dinarior/dpmm subclusters.jl

机译：由于大数据集的可用性不断提高，对通用大规模并行分析工具的需求变得越来越大。在无监督学习中，以Dirichlet-过程混合模型（DPMM）为例的贝叶斯非参数混合模型提供了一种有原则的贝叶斯方法来使模型的复杂性适应数据。尽管具有潜力，但是DPMM尚未成为流行的工具。这部分是由于缺少可以有效处理大型数据集的友好软件工具。在这里，我们展示了如何使用Julia来实现DPMM中分布式推理的高效且易于修改的实现。特别是，我们展示了如何使用分布式内存模型将最新的并行MCMC推理算法（最初在C ++中为单个多核计算机实现）如何在多个多核计算机之间高效地分布。这样可以提高速度，减轻内存和存储的限制，并让我们从明显更大的数据集和更高的维度中学习DPMM。结果还表明，即使在单台机器上，与原始C ++实现相比，拟议的Julia实现也可以更优雅地处理更高的维度（至少对于高斯而言）。最后，我们使用提出的实现来学习图像补丁模型，并将学习到的模型应用于图像去噪。从我们作为机器学习研究人员（而不是HPC研究人员）的角度来看，尽管我们推测C ++中高度优化的分布式实现可能比Julia中提出的实现更快，但后者也提供了一种实用且经济的解决方案。由于易于开发和抽象级别，因此具有很高的价值。我们的代码可在https://github.com/dinarior/dpmm subclusters.jl上公开获得。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|518-525|共8页
会议地点
作者
Or Dinari; Angel Yu; Oren Freifeld; John Fisher;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bayes methods; image denoising; Markov processes; mixture models; multiprocessing systems; nonparametric statistics; object detection; pattern clustering; sampling methods; unsupervised learning;

机译：贝叶斯方法;图像去噪;马尔可夫过程;混合模型;多处理系统;非参数统计;对象检测;模式聚类;采样方法;无监督学习;

相似文献

外文文献
中文文献
专利

1. Bayesian inference by reversible jump MCMC for clustering based on finite generalized inverted Dirichlet mixtures [J] . Bourouis Sami, Al-Osaimi Faisal R., Bouguila Nizar, Soft computing: A fusion of foundations, methodologies and applications . 2019,第14期

机译：通过可逆跳跃MCMC进行贝叶斯推断，基于有限广义倒进的Dirichlet混合物进行聚类
2. Prior selection method using likelihood confidence region and Dirichlet process Gaussian mixture model for Bayesian inference of building energy models [J] . Yi Dong Hyuk, Kim Deuk Woo, Park Cheol Soo Energy and Buildings . 2020,第Octa期

机译：使用似然置信区和Dirichlet工艺Gaussian推理的Dirichlet工艺Gaussian混合模型的先前选择方法
3. Fast bayesian inference in dirichlet process mixture models [J] . Wang, L., Dunson, D.B. Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2011,第1期

机译：Dirichlet过程混合模型中的快速贝叶斯推断
4. Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia [C] . Or Dinari, Angel Yu, Oren Freifeld, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：使用朱莉娅的Dirichlet过程混合模型的分布式MCMC推理
5. Dirichlet process mixture modeling: Hidden Markov mixture models and multi-task compressive sensing [D] . Qi, Yuting 2009

机译：Dirichlet过程混合物建模：隐马尔可夫混合物模型和多任务压缩感测
6. Fast Bayesian Inference in Dirichlet Process Mixture Models [O] . Lianming Wang, David B. Dunson -1

机译：Dirichlet Process混合模型的快速贝叶斯推断
7. Adaptive Low-Complexity Sequential Inference for Dirichlet Process Mixture Models [O] . Tsiligkaridis, Theodoros, Forsythe, Keith W. 2015

机译：Dirichlet过程的自适应低复杂度序贯推理混合模型

Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia

摘要

著录项

相似文献

相关主题

期刊订阅