A Deterministic Actor-Critic Approach to Stochastic Reinforcements

机译：随机增援的决定性演员 - 批评者

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Learning optimal policies under stochastic rewards presents a challenge for well-known reinforcement learning algorithms such as Q-learning. Q-learning has been shown to suffer from a positive bias that inhibits it from learning under inconsistent rewards. Actor-critic methods however do not suffer from such bias but may also fail to acquire the optimal policy under rewards of high variance. We propose the use of a reward shaping function in order to minimize the variance within stochastic rewards. By reformulating Q-learning as a deterministic actor-critic, we show that the use of such reward shaping function improves the acquisition of optimal policies under stochastic reinforcements.

机译：在随机奖励下学习最佳政策对Q-Learning等众所周知的加强学习算法提出了挑战。 Q-Learning已被证明遭受积极偏见，抑制其在不一致的奖励下学习。然而，演员 - 批评方法不遭受这种偏见，但也可能未能在高方差奖励下获得最佳政策。我们提出使用奖励塑造功能，以最大限度地降低随机奖励内的方差。通过将Q-Learning作为一个确定性演员 - 评论家进行重新制定，我们表明使用此类奖励塑造功能可提高随机增强率下的最佳政策。

著录项

来源
《Australasian Joint Conference on Artiﬁcial Intelligence》|2017年|376p|共12页
会议地点
作者
Yemi Okesanjo; Victor Kofia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Actor-critic; Gumbel-Max; Moving median; Reward shaping; Double-Q; Q-learning; Stochastic reinforcements; Reinforcement learning;

机译：演员 - 评论家;Gumbel-Max;移动中位;奖励塑造;双Q;Q-Learning;随机增援;加固学习;

相似文献

外文文献
中文文献
专利

1. An actor-critic deep reinforcement learning approach for metro train scheduling with rolling stock circulation under stochastic demand [J] . Ying Cheng-shuo, Chow Andy H. F., Chin Kwai-Sang Transportation Research Part B: Methodological . 2020,第Octa期

机译：随机需求下滚动股票循环的地铁列车调节探测深度加强学习方法
2. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [J] . Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：软演员批评：带有随机演员的非政策最大熵深度强化学习
3. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [J] . Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：软演员批评：带有随机演员的非政策最大熵深度强化学习
4. A Deterministic Actor-Critic Approach to Stochastic Reinforcements [C] . Yemi Okesanjo, Victor Kofia Australasian joint conference on artificial intelligence . 2017

机译：确定性Actor-Critic方法用于随机钢筋
5. Mars: Multi-Scalable Actor-Critic Reinforcement Learning Scheduler [D] . Baheri, Betis. 2020

机译：火星：多可扩展的演员 - 评论家强化学习调度员
6. Numerical Approach to Spatial Deterministic-Stochastic Models Arising in Cell Biology [O] . James C. Schaff, Fei Gao, Ye Li, 2016

机译：细胞生物学中空间确定性随机模型的数值方法
7. An Actor-Critic Reinforcement Learning Approach to Minimum age of Information Scheduling in Energy Harvesting Networks [O] . Shiyang Leng, Aylin Yener 2021

机译：能量收集网络中信息调度最低年龄的演员批评者加强学习方法

A Deterministic Actor-Critic Approach to Stochastic Reinforcements

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅