...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Accelerating distributed Expectation-Maximization algorithms with frequent updates
【24h】

Accelerating distributed Expectation-Maximization algorithms with frequent updates

机译:通过频繁更新加速分布式期望最大化算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Expectation-Maximization (EM) is a popular approach for parameter estimation in many applications, such as image understanding, document classification, and genome data analysis. Despite the popularity of EM algorithms, it is challenging to efficiently implement these algorithms in a distributed environment for handling massive data sets. In particular, many EM algorithms that frequently update the parameters have been shown to be much more efficient than their concurrent counterparts. Accordingly, we propose two approaches to parallelize such EM algorithms in a distributed environment so as to scale to massive data sets. We prove that both approaches maintain the convergence properties of the EM algorithms. Based on the approaches, we design and implement a distributed framework, FreEM, to support the implementation of frequent updates for the EM algorithms. We show its efficiency through two categories of EM applications, clustering and topic modeling. These applications include k-means clustering, fuzzy c-means clustering, parameter estimation for the Gaussian Mixture Model, and variational inference for Latent Dirichlet Allocation. We extensively evaluate our framework on both a cluster of local machines and the Amazon EC2 cloud. Our evaluation shows that the EM algorithms with frequent updates implemented on FreEM can converge much faster than those implementations with traditional concurrent updates.
机译:期望最大化(EM)是许多应用程序中用于参数估计的流行方法,例如图像理解,文档分类和基因组数据分析。尽管EM算法很受欢迎,但是在分布式环境中有效地实现这些算法以处理海量数据集仍然是一个挑战。特别是,许多频繁更新参数的EM算法已被证明比其并发算法更有效。因此,我们提出了两种在分布式环境中并行化此类EM算法的方法,以便扩展到海量数据集。我们证明这两种方法都保持了EM算法的收敛性。基于这些方法,我们设计并实现了一个分布式框架FreEM,以支持对EM算法进行频繁更新。我们通过两类EM应用程序(聚类和主题建模)展示其效率。这些应用程序包括k均值聚类,模糊c均值聚类,高斯混合模型的参数估计以及潜在Dirichlet分配的变分推断。我们在本地计算机集群和Amazon EC2云上广泛评估了我们的框架。我们的评估表明,与在传统并发更新中实现的那些实现相比,在FreEM上实现了具有频繁更新的EM算法的收敛速度要快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号