...
首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
【24h】

Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game

机译:基于去中心化两臂强盗决策的加速贝叶斯学习及其在Goore游戏中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved in a decentralized manner when modeled as a system of interacting gambling machines. Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. This paper proposes a novel scheme for decentralized decision making based on the Goore Game in which each decision maker is inherently Bayesian in nature, yet avoids computational intractability by relying simply on updating the hyper parameters of sibling conjugate priors, and on random sampling from these posteriors. We further report theoretical results on the variance of the random rewards experienced by each individual decision maker. Based on these theoretical results, each decision maker is able to accelerate its own learning by taking advantage of the increasingly more reliable feedback that is obtained as exploration gradually turns into exploitation in bandit problem based learning. Extensive experiments, involving QoS control in simulated wireless sensor networks, demonstrate that the accelerated learning allows us to combine the benefits of conservative learning, which is high accuracy, with the benefits of hurried learning, which is fast convergence. In this manner, our scheme outperforms recently proposed Goore Game solution schemes, where one has to trade off accuracy with speed. As an additional benefit, performance also becomes more stable. We thus believe that our methodology opens avenues for improved performance in a number of applications of bandit based decentralized decision making.
机译:双臂土匪问题是一种经典的优化问题,决策者会依次拉扯与赌博机相连的两条胳膊之一,每次拉扯都会产生随机奖励。奖励分配是未知的,因此,必须在开发有关武器的现有知识与获取新信息之间取得平衡。匪徒问题之所以特别令人着迷,是因为当将其建模为交互赌博机的系统时,可以以分散的方式解决包括路由,服务质量(QoS)控制,游戏和资源分配在内的大量现实问题。尽管在许多情况下计算上难以处理,但贝叶斯方法为最佳决策提供了标准。本文提出了一种基于Goore博弈的去中心化决策方案,其中,每个决策者本质上都是贝叶斯决策,但仅依靠更新同级共轭先验的超参数以及从这些后验者的随机抽样中避免了计算上的麻烦。 。我们进一步报告每个决策者所经历的随机奖励方差的理论结果。基于这些理论结果,每个决策者都可以利用随着探索逐渐转变为基于匪徒问题的学习中的剥削而获得的越来越可靠的反馈,来加速自己的学习。涉及模拟无线传感器网络中QoS控制的大量实验表明,加速学习使我们能够将保守学习的好处(即高精度)与快速学习的好处(即快速收敛)结合起来。以这种方式,我们的方案优于最近提出的Goore Game解决方案方案,在该方案中,必须权衡准确性和速度。另外一个好处是,性能也变得更加稳定。因此,我们认为我们的方法为基于强盗的分散决策的许多应用打开了改进性能的途径。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号