...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
【24h】

Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability

机译:通过经验衡量性的Langevin算法的本地最优性和泛化保障

获取原文
           

摘要

We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003). For a particular local optimum of the empirical risk, with an extit{arbitrary initialization}, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the $arepsilon$-neighborhood of this particular optimum within a short extit{recurrence time}; (2) it enters this $arepsilon$-neighborhood by the recurrence time and stays there until a potentially exponentially long extit{escape time}. We call this phenomenon extit{empirical metastability}. This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of iterations multiplied by stepsize) is dimension-independent, and resembles the convergence time of continuous-time deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.
机译:我们研究了非凸透态风险最小化(ERM)的离散时间Langevin算法的详细路径,通过卷膜镜片,采用来自Berglund和Gentz(2003)的一些技术。对于特定本地最佳的经验风险,具有 Texit {任意初始化},我们表明,具有很高的概率,将发生以下两个事件中的至少一个:(1)Langevin轨迹最终在$外的某处 varepsilon $ - 在短暂的 exingit {复发时间}内的这个特别是最佳的。 (2)它通过复发时间进入这款$ varepsilon $ -neighborhood并留在那里,直到潜在的令人指重的长 extyit {effect时间}。我们称之为这种现象 Textit {经验迁移性}。这两个时间尺度表征在以下两个感官中与现有文献恰当地对齐。首先,有效的复发时间(即迭代的数量乘以步骤)是尺寸无关的,并且类似于连续时间确定梯度下降(GD)的收敛时间。然而,与GD不同,LangeVin算法不需要对本地初始化的强大条件,并且可能最终访问所有最佳才能。其次,逃生时间的缩放与eycing-kramers法则一致,这使得Langevin计划最终将访问所有当地的最小值,但它将在他们之间进行指数率很长时间。我们在统计学习的背景下应用此路径明智的浓度,以检查泛型概念和最优性的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号