首页>
外国专利>
ROBUST REINFORCEMENT LEARNING FOR CONSTRAINT SATISFACTION WHILE ACCOUNTING FOR MODEL MISSPECIFICATION
ROBUST REINFORCEMENT LEARNING FOR CONSTRAINT SATISFACTION WHILE ACCOUNTING FOR MODEL MISSPECIFICATION
展开▼
机译:考虑模型错误的约束满足鲁棒强化学习
展开▼
页面导航
摘要
著录项
相似文献
摘要
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for learning a control policy for controlling an agent. One of the methods includes sampling a mini-batch comprising one or more observation - action - reward tuples generated as a result of interactions of a first agent with a first environment; determining an update to current values of Q network parameters by minimizing a robust constrained temporal difference (TD) error that accounts for possible perturbations of the states of the first environment represented by the observations in the observation - action - reward tuples; and determining, using the Q-value neural network, an update to the policy network parameters using the sampled mini-batch of observation - action - reward tuples.
展开▼