首页> 美国政府科技报告 >Approximating Action-Value Functions: Addressing Issues of Dynamic Range
【24h】

Approximating Action-Value Functions: Addressing Issues of Dynamic Range

机译:近似动作 - 值函数:解决动态范围问题

获取原文

摘要

Function approximation is necessary when applying RL to either Markov decision processes (MDPs) or semi-Markov decision processes (SMDPs) with very large state spaces. An often overlooked issue in approximating Q-functions in either framework arises when an action value update in a given state causes a large policy change in other states. Another way of stating this is to say that a small change in the Q-function results in a large change in the implied greedy policy. We call this sensitivity to changes in the Q-function the dynamic range problem and suggest that it may result in greatly increasing the number of training updates required to accurately approximate the optimal policy. We demonstrate that Advantage Learning solves the dynamic range problem in both frameworks and is more robust than some other RL algorithms on these problems. For an MDP, the Advantage Learning algorithm addresses this issue by re-scaling the dynamic range of action values within each state by a constant. For SMDPs the scaling constant can vary for each action.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号