首页> 外文会议>Australasian Joint Conference on Artificial Intelligence >A Deterministic Actor-Critic Approach to Stochastic Reinforcements
【24h】

A Deterministic Actor-Critic Approach to Stochastic Reinforcements

机译:随机增援的决定性演员 - 批评者

获取原文
获取外文期刊封面目录资料

摘要

Learning optimal policies under stochastic rewards presents a challenge for well-known reinforcement learning algorithms such as Q-learning. Q-learning has been shown to suffer from a positive bias that inhibits it from learning under inconsistent rewards. Actor-critic methods however do not suffer from such bias but may also fail to acquire the optimal policy under rewards of high variance. We propose the use of a reward shaping function in order to minimize the variance within stochastic rewards. By reformulating Q-learning as a deterministic actor-critic, we show that the use of such reward shaping function improves the acquisition of optimal policies under stochastic reinforcements.
机译:在随机奖励下学习最佳政策对Q-Learning等众所周知的加强学习算法提出了挑战。 Q-Learning已被证明遭受积极偏见,抑制其在不一致的奖励下学习。然而,演员 - 批评方法不遭受这种偏见,但也可能未能在高方差奖励下获得最佳政策。我们提出使用奖励塑造功能,以最大限度地降低随机奖励内的方差。通过将Q-Learning作为一个确定性演员 - 评论家进行重新制定,我们表明使用此类奖励塑造功能可提高随机增强率下的最佳政策。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号