...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >How Good is SGD with Random Shuffling?
【24h】

How Good is SGD with Random Shuffling?

机译:SGD随机洗牌有多好?

获取原文
           

摘要

We study the performance of stochastic gradient descent (SGD) on smooth and strongly-convex finite-sum optimization problems. In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions. This setting has been investigated in several recent works, but the optimal error rates remain unclear. In this paper, we provide lower bounds on the expected optimization error with these heuristics (using SGD with any constant step size), which elucidate their advantages and disadvantages. In particular, we prove that after $k$ passes over $n$ individual functions, if the functions are re-shuffled after every pass, the best possible optimization error for SGD is at least $Omegaleft(1/(nk)^2+1/nk^3ight)$, which partially corresponds to recently derived upper bounds. Moreover, if the functions are only shuffled once, then the lower bound increases to $Omega(1/nk^2)$. Since there are strictly smaller upper bounds for repeated reshuffling, this proves an inherent performance gap between SGD with single shuffling and repeated shuffling. As a more minor contribution, we also provide a non-asymptotic $Omega(1/k^2)$ lower bound (independent of $n$) for the incremental gradient method, when no random shuffling takes place. Finally, we provide an indication that our lower bounds are tight, by proving matching upper bounds for univariate quadratic functions.
机译:我们研究随机梯度下降(SGD)对光滑且强凸的有限和优化问题的性能。与大多数现有理论作品相比,这假设单个函数被取样,我们专注于流行但不知情的启发式,这涉及越过各个功能的随机排列。此设置已在几个最近的作品中进行了调查,但最佳误差率仍不清楚。在本文中,我们在这些启发式中提供了下限的预期优化误差(使用任何恒定的步长),阐明了它们的优缺点。特别是,我们证明,在$ k $超过$ n $单个函数后,如果在每次通过后重新播放函数,则SGD的最佳优化错误至少为$ OMEGA 左(1 /(NK) ^ 2 + 1 / nk ^ 3 右)$,它部分对应于最近派生的上限。此外,如果函数仅次播放一次,则下限增加到$ Omega(1 / NK ^ 2)$。由于重复重新洗牌存在严格较小的上限,因此证明了SGD之间的固有性能差距,单个洗牌和反复洗牌。作为更轻微的贡献,我们还提供了一个非渐近$ oomega(1 / k ^ 2)美元(1 / k ^ 2)为增量梯度法,当没有随机洗牌时,增加了增量渐变方法。最后,我们提供了一种指示,我们的下限是紧张的,通过证明单变量二次函数的匹配的上限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号