首页> 外文会议>American Control Conference >Reinforcement learning with supervision by combining multiple learnings and expert advices
【24h】

Reinforcement learning with supervision by combining multiple learnings and expert advices

机译:通过结合多种学习和专家建议,加强监督

获取原文

摘要

In this paper, we provide a formal coherent learning framework where reinforcement learning is combined with multiple learnings and expert advices toward accelerating convergence speed of learning. Our approach is simply to use a nonstationary "potential-based reinforcement function" for shaping the reinforcement signal given to the learning "base-agent". The base-agent employes SARSA(O) or adaptive asynchronous value iteration (VI), and the supervised inputs to the base-agent from the "subagents" involved with other parallel independent reinforcement learnings and if available, from experts are "merged" into the potential-based reinforcement function value and the value is put into the update equation of SARSA(O) for the Q-function estimate or of adaptive asynchronous VI for the optimal value function estimate. The resulting SARSA(O) and adaptive asynchronous VI converge to an optimal policy, respectively.
机译:在本文中,我们提供了一个正式的一致学习框架,加强学习与多个学习和专家建议相结合,旨在加速学习融合速度。我们的方法只是利用非间断的“基于潜在的加强功能”来塑造给予学习“基础代理”的加强信号。基础代理商使用Sarsa(O)或自适应异步值迭代(VI),以及来自其他平行独立的强化学习的“子代理”的受监管输入,从专家们将“合并”进入基于潜在的增强函数值和该值被放入Sarsa(O)的更新方程,了解Q函数估计或自适应异步VI,以获得最佳值函数估计。由此产生的SARSA(O)和自适应异步VI分别收敛到最佳策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号