首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >A Conclusive Analysis of the Finite-Time Behavior of the Discretized Pursuit Learning Automaton
【24h】

A Conclusive Analysis of the Finite-Time Behavior of the Discretized Pursuit Learning Automaton

机译:离散追踪学习自动机有限时间行为的结论性分析

获取原文
获取原文并翻译 | 示例
           

摘要

This paper deals with the finite-time analysis (FTA) of learning automata (LA), which is a topic for which very little work has been reported in the literature. This is as opposed to the asymptotic steady-state analysis for which there are, probably, scores of papers. As clarified later, unarguably, the FTA of Markov chains, in general, and of LA, in particular, is far more complex than the asymptotic steady-state analysis. Such an FTA provides rigid bounds for the time required for the LA to attain to a given convergence accuracy. We concentrate on the FTA of the Discretized Pursuit Automaton (DPA), which is probably one of the fastest and most accurate reported LA. Although such an analysis was carried out many years ago, we record that the previous work is flawed. More specifically, in all brevity, the flaw lies in the wrongly "derived" monotonic behavior of the LA after a certain number of iterations. Rather, we claim that the property should be invoked is the submartingale property. This renders the proof to be much more involved and deep. In this paper, we rectify the flaw and reestablish the FTA based on such a submartingale phenomenon. More importantly, from the derived analysis, we are able to discover and clarify, for the first time, the underlying dilemma between the DPA's exploitation and exploration properties. We also nontrivially confirm the existence of the optimal learning rate, which yields a better comprehension of the DPA itself.
机译:本文涉及学习自动机(LA)的有限时间分析(FTA),这是一个文献报道很少的工作。这与可能有数十篇论文的渐近稳态分析相反。正如后面将要阐明的,毫无疑问,一般来说,马尔可夫链的自由贸易区,特别是洛杉矶的自由贸易区,要比渐进稳态分析复杂得多。这样的FTA为LA达到给定的收敛精度所需的时间提供了严格的界限。我们专注于离散追踪自动机(DPA)的FTA,它可能是最快,最准确的LA报告之一。尽管这种分析是在很多年前进行的,但我们记录到以前的工作是有缺陷的。更具体地说,简而言之,缺陷在于经过一定数量的迭代后,LA的错误“推导”单调行为。相反,我们声称应该调用的属性是submartingale属性。这使得证据更加复杂和深入。在本文中,我们纠正了这一缺陷,并基于这种子市场现象重新建立了FTA。更重要的是,从派生的分析中,我们能够首次发现和澄清DPA的开采和勘探性质之间的潜在困境。我们还毫不费力地确认了最佳学习率的存在,这可以更好地理解DPA本身。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号