【24h】

Praticing-Q-learning

机译:练习Q学习

获取原文
获取原文并翻译 | 示例

摘要

Q-Learning has gained increasing attention as a promising real time learning scheme from delayed reinforcement. Being compact, model free and theoretically optimal it is commonly preferred to AHC-Learning and its derivatives. However, it has long been noticed that theoritical optimality has to be sacrificed in order to meet the constraints of most applications. In this article we report of experiments with modified Q-Learning algorithms together with their key ingredients for practical success in reinforcement learning. These include optimistic initialization, the principle of piecewise constancy of policy and the use of activity traces. Finally, we extend these algorithms for growing RBF networks with additional on-line learning vector quantization (adaptive perceptualization) and obtain very encouraging results as well. Our test bed is pole balancing with additional noise on the sensory input.
机译:作为一种有前途的实时学习计划,Q-Learning由于延迟的强化而受到越来越多的关注。紧凑,无模型且理论上最佳,通常比AHC-Learning及其衍生产品更受青睐。但是,长期以来人们一直在注意,为了满足大多数应用的约束,必须牺牲理论上的最优性。在本文中,我们报告了使用改进的Q学习算法进行的实验及其在强化学习中取得实际成功的关键要素。这些措施包括乐观初始化,策略的分段恒定原则以及活动跟踪的使用。最后,我们将这些算法扩展为具有额外的在线学习矢量量化(自适应感知)的RBF网络,并获得了非常令人鼓舞的结果。我们的测试台是极平衡的,在感官输入端带有额外的噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号