首页> 外文会议>IEEE International Conference on Robotics and Automation >Optimizing Simulations with Noise-Tolerant Structured Exploration
【24h】

Optimizing Simulations with Noise-Tolerant Structured Exploration

机译:用抗噪声结构探索优化模拟

获取原文

摘要

We propose a simple drop-in noise-tolerant replacement for the standard finite difference procedure used ubiquitously in blackbox optimization. In our approach, parameter perturbation directions are defined by a family of structured orthogonal matrices. We show that at the small cost of computing a Fast Walsh-Hadamard/Fourier Transform (FWHT/FFT), such structured finite differences consistently give higher quality approximation of gradients and Jacobians in comparison to vanilla approaches that use coordinate directions or random Gaussian perturbations. We find that trajectory optimizers like Iterative LQR and Differential Dynamic Programming require fewer iterations to solve several classic continuous control tasks when our methods are used to linearize noisy, blackbox dynamics instead of standard finite differences. By embedding structured exploration in a quasi-Newton optimizer (LBFGS), we are able to learn agile walking and turning policies for quadruped locomotion, that successfully transfer from simulation to actual hardware. We theoretically justify our methods via bounds on the quality of gradient reconstruction and provide a basis for applying them also to nonsmooth problems.
机译:我们为Blackbox优化中普遍使用的标准有限差分程序提出了一种简单的抗噪声替代。在我们的方法中,参数扰动方向由结构化正交矩阵的系列定义。我们表明,在计算快速沃尔什哈拉德/傅立叶变换(FWHT / FFT)的情况下,这种结构化的有限差异始终如一地提供与使用坐标方向或随机高斯扰动的香草方法相比提供更高质量的梯度和雅加诺斯的近似。我们发现,当我们的方法用于线性化噪声时,BlackBox动态而不是标准有限差别时,迭代LQR和差分动态编程等轨迹优化器需要更少的迭代来解决多种经典连续控制任务。通过在Quasi-Newton Optimizer(LBFG)中嵌入结构化探索,我们能够学习敏捷的步行和转向Quadruped Locomotion的策略,从而成功地将仿真传输到实际硬件。我们通过梯度重建质量的界限理论地理解我们的方法,并为应用程序应用于非现场问题提供依据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号