【24h】

The Bayesian Pursuit Algorithm: A New Family of Estimator Learning Automata

机译:贝叶斯追踪算法:估计器学习自动机的新家族

获取原文

摘要

The fastest Learning Automata (LA) algorithms currently available come from the family of estimator algorithms. The Pursuit algorithm (PST), a pioneering scheme in the estimator family, obtains its superior learning speed by using Maximum Likelihood (ML) estimates to pursue the action currently perceived as being optimal. Recently, a Bayesian LA (BLA) was introduced, and empirical results that demonstrated its advantages over established top performers, including the PST scheme, were reported. The BLA scheme is inherently Bayesian in nature, but it succeeds in avoiding the computational intractability by merely relying on updating the hyper-parameters of sibling conjugate priors, and on random sampling from the resulting posteriors. In this paper, we integrate the foundational learning principles motivating the design of the BLA, with the principles of the PST. By doing this, we have succeeded in obtaining a completely novel, and rather pioneering, approach to solving LA-like problems, namely, by designing the Bayesian Pursuit algorithm (BPST). As in the BLA, the estimates are truly Bayesian (as opposed to ML) in nature. However, the action selection probability vector of the PST is used for its exploration purposes. Also, unlike the ML estimate, which is usually a single value, the use of a posterior distribution permits us to choose any one of a spectrum of values in the posterior, as the appropriate estimate. Thus, in this paper, we have chosen a 95% percentile value of the posterior (instead of the mean) to pursue the most promising actions. Further, as advocated in [7], the pursuit has been done using both the Linear Reward-Penalty and Reward-Inaction philosophies, leading to the corresponding BPST_(RP) and BPST_(RI) schemes respectively. It turns out that the BPST is superior to the PST, with the BPST_(RI) being even more robust than the BPST_(RP). Moreover, by controlling the learning speed of the BPST, the BPST schemes perform either better or comparable to the BLA. We thus believe that the BPST constitutes a new avenue of research, in which the performance benefits of the PST and the BLA are mutually augmented, opening up for improved performance in a number of applications, currently being tested.
机译:当前可用的最快的学习自动机(LA)算法来自估计器算法系列。追踪算法(PST)是估算器系列的开创性方案,它通过使用最大似然(ML)估算值来追求当前被认为是最佳的动作,从而获得了卓越的学习速度。最近,引入了贝叶斯LA(BLA),并报告了经验结果证明了其优于包括PST方案在内的已确立的最佳绩效的优势。 BLA方案本质上是固有的贝叶斯方法,但是它仅通过更新同级共轭先验的超参数以及从所得后代中的随机采样成功地避免了计算上的麻烦。在本文中,我们将激励BLA设计的基础学习原理与PST原理相结合。通过这样做,我们成功地获得了一种解决LA类问题的全新且颇具开创性的方法,即通过设计贝叶斯追求算法(BPST)。就像在BLA中一样,估计值实际上是真正的贝叶斯(相对于ML)。但是,PST的动作选择概率向量用于其探索目的。而且,与通常为单个值的ML估计不同,后验分布的使用允许我们选择后验中的一系列频谱中的任何一个作为适当的估计。因此,在本文中,我们选择了后方95%的百分比值(而不是均值)来执行最有希望的操作。此外,正如在[7]中所提倡的那样,使用线性奖励-罚分和奖励-无为哲学都进行了追踪,分别导致了相应的BPST_(RP)和BPST_(RI)方案。事实证明,BPST优于PST,而BPST_(RI)甚至比BPST_(RP)更加健壮。此外,通过控制BPST的学习速度,BPST方案的性能更好或与BLA相当。因此,我们认为BPST构成了一种新的研究途径,其中PST和BLA的性能优势相互增强,为目前正在测试的许多应用打开了改进性能的大门。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号