首页> 外文期刊>Advanced Robotics: The International Journal of the Robotics Society of Japan >Design of restricted normalizing flow towards arbitrary stochastic policy with computational efficiency
【24h】

Design of restricted normalizing flow towards arbitrary stochastic policy with computational efficiency

机译:具有计算效率的任意随机策略的受限归一化流设计

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper proposes a new design method for a stochastic control policy using a normalizing flow (NF). In reinforcement learning (RL), the policy is usually modeled as a distribution model with trainable parameters. When this parameterization has less expressiveness, it would fail to acquiring the optimal policy. A mixture model has capability of a universal approximation, but it with too much redundancy increases the computational cost, which can become a bottleneck when considering the use of real-time robot control. As another approach, NF, which is with additional parameters for invertible transformation from a simple stochastic model as a base, is expected to exert high expressiveness and lower computational cost. However, NF cannot compute its mean analytically due to complexity of the invertible transformation, and it lacks reliability because it retains stochastic behaviors after deployment for robot controller. This paper therefore designs a restricted NF (RNF) that achieves an analytic mean by appropriately restricting the invertible transformation. In addition, the expressiveness impaired by this restriction is regained using bimodal student-t distribution as its base, so-called Bit-RNF. In RL benchmarks, Bit-RNF policy outperformed the previous models. Finally, a real robot experiment demonstrated the applicability of Bit-RNF policy to real world.
机译:该文提出了一种新的基于归一化流(NF)的随机控制策略设计方法。在强化学习 (RL) 中,策略通常被建模为具有可训练参数的分布模型。当这种参数化的表现力较低时,它将无法获得最佳策略。混合模型具有通用近似的能力,但冗余过多会增加计算成本,这在考虑使用实时机器人控制时可能成为瓶颈。作为另一种方法,NF具有从简单随机模型作为基础的可逆变换的附加参数,有望发挥高表现力和更低的计算成本。然而,由于可逆变换的复杂性,NF无法解析计算其均值,并且由于在机器人控制器部署后保留了随机行为,因此缺乏可靠性。因此,本文设计了一种受限NF(RNF),通过适当限制可逆变换来实现解析均值。此外,使用双峰 student-t 分布作为其基础,即所谓的 Bit-RNF,重新获得受此限制影响的表达性。在 RL 基准测试中,Bit-RNF 策略优于以前的模型。最后,一个真实的机器人实验证明了Bit-RNF策略在现实世界中的适用性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号