首页> 外文期刊>JMLR: Workshop and Conference Proceedings >CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC
【24h】

CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC

机译:CPSG-MCMC:随机梯度MCMC的基于聚类的预处理方法

获取原文
           

摘要

In recent years, stochastic gradient Markov Chain Monte Carlo (SG-MCMC) methods have been raised to process large-scale dataset by iterative learning from small minibatches. However, the high variance caused by naive subsampling usually slows down the convergence to the desired posterior distribution. In this paper, we propose an effective subsampling strategy to reduce the variance based on a failed attempt to do importance sampling. In particular, before sampling, we partition the dataset with k-means clustering algorithm in a preprocessing step and use the fixed clustering throughout the entire MCMC simulation. Then during simulation, we approximate the gradient of log-posterior via summing the estimated gradient of each cluster. The resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm during sampling procedure. We apply our Clustering-based Preprocessing strategy on stochastic gradient Langevin dynamics, stochastic gradient Hamilton Monte Carlo and stochastic gradient Riemann Langevin dynamics. Empirically, we provide thorough numerical results to back up the effectiveness and efficiency of our approach.
机译:近年来,已经提出了从小批处理中迭代学习的随机梯度马尔可夫链蒙特卡洛(SG-MCMC)方法来处理大规模数据集。但是,由朴素的子采样引起的高方差通常会降低收敛到所需后验分布的速度。在本文中,我们提出了一种有效的子采样策略,以减少基于重要性采样失败的方差。特别是,在采样之前,我们在预处理步骤中使用k-均值聚类算法对数据集进行分区,并在整个MCMC仿真中使用固定聚类。然后在仿真过程中,我们通过对每个聚类的估计梯度求和来近似对数后梯度。所得过程非常简单,却没有增加采样过程中原始算法的复杂性。我们将基于聚类的预处理策略应用于随机梯度Langevin动力学,随机梯度Hamilton Monte Carlo和随机梯度Riemann Langevin动力学。从经验上讲,我们提供了详尽的数值结果来支持我们方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号