VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

【24h】

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

机译：VR-SGD：一种用于机器学习的简单随机方差减少方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $mathcal {O}(1/T2)$O(1/T2). Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.

机译：在本文中，我们提出了原始SVRG的一个简单变体，称为方差降低随机梯度下降（VR-SGD）。与在SVRG及其近端变体Prox-SVRG中选择快照和起点不同，VR-SGD的两个向量分别设置为上一个时期的平均值和最后一次迭代。这些设置使我们可以使用更大的学习率，也使我们的收敛性分析更具挑战性。我们还分别针对平滑和非平滑目标函数设计了两种不同的更新规则，这意味着VR-SGD可以直接解决非平滑和/或非强凸问题，而无需任何简化技术。此外，我们分析了VR-SGD对于强凸问题的收敛性，表明VR-SGD达到了线性收敛。与大多数没有针对非强凸问题的收敛性保证的算法不同，我们还针对这种情况提供了VR-SGD的收敛性保证，并通过经验验证了学习率不同的VR-SGD可以实现与其动量加速变体相似的性能，即具有最佳收敛速度$ mathcal {O}（1 / T2）$ O（1 / T2）。最后，我们应用VR-SGD解决各种机器学习问题，例如凸和非凸经验风险最小化以及领先的特征值计算。实验结果表明，VR-SGD的收敛速度明显快于SVRG和Prox-SVRG，并且通常优于最新的加速方法（例如Katyusha）。

著录项

来源
《IEEE Transactions on Knowledge and Data Engineering》 |2020年第1期|188-202|共15页
作者

展开▼
作者单位

Xidian Univ Sch Artificial Intelligence Minist Educ Key Lab Intelligent Percept & Image Understanding Xian 710126 Shaanxi Peoples R China;

Chinese Univ Hong Kong Dept Comp Sci & Engn Shatin Hong Kong Peoples R China;

Univ Technol Sydney Ctr Artificial Intelligence Ultimo NSW 2007 Australia;

Nanjing Univ Natl Key Lab Novel Software Technol Nanjing 210023 Jiangsu Peoples R China;

Univ Sydney UBTECH Sydney Artificial Intelligence Ctr 6 Cleveland St Darlington NSW 2008 Australia|Univ Sydney Fac Engn & Informat Technol Sch Informat Technol 6 Cleveland St Darlington NSW 2008 Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convergence; Acceleration; Complexity theory; Stochastic processes; Optimization; Machine learning; Risk management; Stochastic optimization; stochastic gradient descent (SGD); variance reduction; empirical risk minimization; strongly convex and non-strongly convex; smooth and non-smooth;

机译：收敛;加速;复杂性理论;随机过程;优化;机器学习;风险管理;随机优化;随机梯度下降（SGD）;方差减少;经验风险最小化;强凸和非强凸;光滑不光滑;

相似文献

外文文献
中文文献
专利

1. Decentralized Stochastic Optimization and Machine Learning: A Unified Variance-Reduction Framework for Robust Performance and Fast Convergence [J] . Xin Ran, Kar Soummya, Khan Usman A. IEEE Signal Processing Magazine . 2020,第3期

机译：分散的随机优化和机器学习：统一的差异减少框架，可实现鲁棒性能和快速收敛性
2. SAAGs: Biased stochastic variance reduction methods for large-scale learning [J] . Chauhan Vinod Kumar, Sharma Anuj, Dahiya Kalpana Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2019,第9期

机译：SAAGS：大规模学习的偏置随机差异减少方法
3. An accelerated stochastic variance-reduced method for machine learning problems [J] . Yang Zhuang, Chen Zengping, Wang Cheng Knowledge-Based Systems . 2020,第Juna21期

机译：一种加速随机方差减少方法，用于机器学习问题
4. An Effective Hard Thresholding Method Based on Stochastic Variance Reduction for Nonconvex Sparse Learning [C] . Guannan Liang, Qianqian Tong, Chunjiang Zhu, AAAI Conference on Artificial Intelligence . 2020

机译：基于随机稀疏学习的随机方差减少的有效硬阈值方法
5. On Variance Reduction in Machine Learning [D] . Li, Bingcong. 2021

机译：论机器学习的方差减少
6. A Machine Learning Processing Pipeline for Reliable Hand Gesture Classification of FMG Signals with Stochastic Variance [O] . Mohammed Asfour, Carlo Menon, Xianta Jiang 2021

机译：一种机器学习处理管道用于具有随机方差的FMG信号的可靠手势分类
7. SAAGs: Biased stochastic variance reduction methods for large-scale learning [O] . Vinod Kumar Chauhan, Anuj Sharma, Kalpana Dahiya 2019

机译：SAAGS：大规模学习的偏见随机方差减少方法

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅