The incremental gradient method is a prominent algorithm for minimizing afinite sum of smooth convex functions, used in many contexts includinglarge-scale data processing applications and distributed optimization overnetworks. It is a first-order method that processes the functions one at a timebased on their gradient information. The incremental Newton method, on theother hand, is a second-order variant which exploits additionally the curvatureinformation of the underlying functions and can therefore be faster. In thispaper, we focus on the case when the objective function is strongly convex andpresent fast convergence results for the incremental gradient and incrementalNewton methods under the constant and diminishing stepsizes. For a decayingstepsize rule $lpha_k = Theta(1/k^s)$ with $s in (0,1]$, we show that thedistance of the IG iterates to the optimal solution converges at rate ${calO}(1/k^{s})$ (which translates into ${cal O}(1/k^{2s})$ rate in thesuboptimality of the objective value). For $s>1/2$, this improves the previous${cal O}(1/sqrt{k})$ results in distances obtained for the case whenfunctions are non-smooth. We show that to achieve the fastest ${cal O}(1/k)$rate, incremental gradient needs a stepsize that requires tuning to the strongconvexity parameter whereas the incremental Newton method does not. The resultsare based on viewing the incremental gradient method as a gradient descentmethod with gradient errors, devising efficient upper bounds for the gradienterror to derive inequalities that relate distances of the consecutive iteratesto the optimal solution and finally applying Chung's lemmas from the stochasticapproximation literature to these inequalities to determine their asymptoticbehavior. In addition, we construct examples to show tightness of our rateresults.
展开▼
机译:增量梯度法是用于最小化的平滑凸函数afinite总和一个突出的算法,在许多情况下使用includinglarge大规模数据处理应用和分布式优化overnetworks。它是处理功能中的一个在一个时基上的梯度信息的一阶方法。增量牛顿法,在theother手,是一个二阶的变体,其另外利用的基本功能curvatureinformation,因此可以更快。在thispaper,我们专注于情况下,当目标函数是强凸andpresent为增量梯度和常量下incrementalNewton方法和递减stepsizes快速收敛的结果。对于decayingstepsize规则$ alpha_k = 西塔(1 / K ^ S)$与$ S 在(0,1] $,我们显示了IG的那thedistance迭代到最优解收敛于速率$ {卡洛} (1 / K ^ {S})$(这转化为$ { CAL O校}(1 / K ^ {2S})在物镜值的thesuboptimality $率)。对于$ S> 1/2 $,这改进以前的$ { CAL O校}(1 / 开方{K})在的情况下whenfunctions获得的距离$结果不光滑。我们发现,以达到最快的$ { CAL O校}(1 / K)$率,增加梯度需要,需要调谐到strongconvexity参数而增量牛顿方法的步长不。基于观看增量梯度方法与梯度误差的梯度descentmethod,制定高效的上限为gradienterror推导不平等的resultsare那涉及连续iteratesto的距离最优解,最后施加Chung的引理从stochasticapproximation文献这些不等式来确定它们asymptoti cbehavior。此外,我们构造例子来说明我们的rateresults的密封性。
展开▼