首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning
【24h】

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

机译:VR-SGD:一种用于机器学习的简单随机方差减少方法

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $mathcal {O}(1/T2)$O(1/T2). Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.
机译:在本文中,我们提出了原始SVRG的一个简单变体,称为方差降低随机梯度下降(VR-SGD)。与在SVRG及其近端变体Prox-SVRG中选择快照和起点不同,VR-SGD的两个向量分别设置为上一个时期的平均值和最后一次迭代。这些设置使我们可以使用更大的学习率,也使我们的收敛性分析更具挑战性。我们还分别针对平滑和非平滑目标函数设计了两种不同的更新规则,这意味着VR-SGD可以直接解决非平滑和/或非强凸问题,而无需任何简化技术。此外,我们分析了VR-SGD对于强凸问题的收敛性,表明VR-SGD达到了线性收敛。与大多数没有针对非强凸问题的收敛性保证的算法不同,我们还针对这种情况提供了VR-SGD的收敛性保证,并通过经验验证了学习率不同的VR-SGD可以实现与其动量加速变体相似的性能,即具有最佳收敛速度$ mathcal {O}(1 / T2)$ O(1 / T2)。最后,我们应用VR-SGD解决各种机器学习问题,例如凸和非凸经验风险最小化以及领先的特征值计算。实验结果表明,VR-SGD的收敛速度明显快于SVRG和Prox-SVRG,并且通常优于最新的加速方法(例如Katyusha)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号