首页> 外文期刊>Applied Intelligence >On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata
【24h】

On incorporating the paradigms of discretization and Bayesian estimation to create a new family of pursuit learning automata

机译:关于结合离散化和贝叶斯估计的范式以创建新的追求学习自动机族

获取原文
获取原文并翻译 | 示例
           

摘要

There are currently two fundamental paradigms that have been used to enhance the convergence speed of Learning Automata (LA). The first involves the concept of utilizing the estimates of the reward probabilities, while the second involves discretizing the probability space in which the LA operates. This paper demonstrates how both of these can be simultaneously utilized, and in particular, by using the family of Bayesian estimates that have been proven to have distinct advantages over their maximum likelihood counterparts. The success of LA-based estimator algorithms over the classical, Linear Reward-Inaction (L RI )-like schemes, can be explained by their ability to pursue the actions with the highest reward probability estimates. Without access to reward probability estimates, it makes sense for schemes like the L RI to first make large exploring steps, and then to gradually turn exploration into exploitation by making progressively smaller learning steps. However, this behavior becomes counter-intuitive when pursuing actions based on their estimated reward probabilities. Learning should then ideally proceed in progressively larger steps, as the reward probability estimates turn more accurate. This paper introduces a new estimator algorithm, the Discretized Bayesian Pursuit Algorithm (DBPA), that achieves this by incorporating both the above paradigms. The DBPA is implemented by linearly discretizing the action probability space of the Bayesian Pursuit Algorithm (BPA) (Zhang et al. in IEA-AIE 2011, Springer, New York, pp. 608–620, 2011). The key innovation of this paper is that the linear discrete updating rules mitigate the counter-intuitive behavior of the corresponding linear continuous updating rules, by augmenting them with the reward probability estimates. Extensive experimental results show the superiority of DBPA over previous estimator algorithms. Indeed, the DBPA is probably the fastest reported LA to date. Apart from the rigorous experimental demonstration of the strength of the DBPA, the paper also briefly records the proofs of why the BPA and the DBPA are ϵ-optimal in stationary environments.
机译:当前,有两种基本范例已用于提高学习自动机(LA)的收敛速度。第一个涉及利用奖励概率估计的概念,而第二个涉及离散化LA操作所在的概率空间。本文演示了如何同时使用这两种方法,特别是通过使用贝叶斯估计值系列,这些估计值已被证明比最大似然估计值具有明显优势。基于LA的估算器算法在类似于经典线性奖励无所作为(L RI)的方案上的成功,可以用其追求具有最高奖励概率估算值的动作的能力来解释。如果无法获得奖励概率估计值,那么像L RI这样的计划就必须先进行较大的探索步骤,然后通过逐渐减少学习步骤,逐渐将探索转化为开发,这是有意义的。但是,这种行为在基于其估计的奖励概率执行动作时变得反常理。理想情况下,随着奖励概率估计变得更加准确,学习应该逐步逐步进行。本文介绍了一种新的估计器算法,即离散贝叶斯追踪算法(DBPA),该算法通过结合上述两种范例来实现。 DBPA是通过线性离散贝叶斯追踪算法(BPA)的动作概率空间来实现的(Zhang等人,IEA-AIE 2011,纽约Springer,第608-620页,2011)。本文的主要创新之处在于,线性离散更新规则通过使用奖励概率估计值对它们进行扩充,减轻了相应线性连续更新规则的违反直觉的行为。大量的实验结果表明,DBPA优于以前的估算器算法。实际上,DBPA可能是迄今为止报告最快的LA。除了严格实验证明DBPA的强度外,本文还简要记录了为什么BPA和DBPA在固定环境中为ϵ最优的证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号