【24h】

Alternating Optimisation and Quadrature for Robust Control

机译:鲁棒控制的交替优化和正交

获取原文

摘要

Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are unobservable and randomly determined by the environment in a physical setting but are controllable in a simulator. This paper considers the problem of finding a robust policy while taking into account the impact of environment variables. We present Alternating Optimisation and Quadrature (ALOQ), which uses Bayesian optimisation and Bayesian quadrature to address such settings. ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy. Experimental results across different domains show that ALOQ can learn more efficiently and robustly than existing methods.
机译:贝叶斯优化已成功应用于各种强化学习问题。 但是,在模拟器中学习最佳策略的传统方法不利用机会通过调整某些环境变量来改善学习:状态特征是不可观察的并且由物理设置中的环境随机确定,但是在模拟器中可控。 本文考虑了在考虑环境变量的影响时找到强大的策略的问题。 我们呈现交替的优化和正交(ALOQ),它使用贝叶斯优化和贝叶斯正交来解决此类设置。 AloQ对存在显着罕见事件的稳健性,这在随机抽样中可能无法观察到,但在确定最佳政策方面发挥了重大作用。 不同领域的实验结果表明,AloQ可以比现有方法更有效地学习和强大。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号