Continuous Action Reinforcement Learning for Control-Affine Systems with Unknown Dynamics

Aleksandra Faust; Peter Ruymgaart; Molly Salman; Rafael Fierro; Lydia Tapia

摘要

Control of nonlinear systems is challenging in realtime.Decision making,performed many times per second,must ensure system safety.Designing input to perform a task often involves solving a nonlinear system of differential equations,which is a computationally intensive,if not intractable problem.This article proposes sampling-based task learning for controlaffine nonlinear systems through the combined learning of both state and action-value functions in a model-free approximate value iteration setting with continuous inputs.A quadratic negative definite state-value function implies the existence of a unique maximum of the action-value function at any state.This allows the replacement of the standard greedy policy with a computationally efficient policy approximation that guarantees progression to a goal state without knowledge of the system dynamics.The policy approximation is consistent,i.e.,it does not depend on the action samples used to calculate it.This method is appropriate for mechanical systems with high-dimensional input spaces and unknown dynamics performing Constraint-Balancing Tasks.We verify it both in simulation and experimentally for an Unmanned Aerial Vehicles(UAVs) carrying a suspended load,and in simulation,for the rendezvous of heterogeneous robots.

机译：非线性系统的控制在实时有挑战性。调节制作，每秒执行多次，必须确保系统安全。输入要执行任务的输入往往涉及求解差分方程的非线性系统，这是一种计算密集型的，如果不是难治性的问题。本文通过组合在无模型近似值迭代设置中使用连续输入的模型和动作值函数的组合学习来提出基于采样的任务学习。在不连续输入中的无模型近似值迭代设置中.A二次负定的状态值函数意味着存在a任何状态的动作值函数的唯一最大值。这允许更换标准贪婪策略，以计算的有效策略近似，保证在没有系统动态的情况下向目标状态的进展。策略近似是一致的，即它不依赖于用于计算它的动作样本。这方法适合我具有高维输入空间的Chanical系统和执行约束平衡任务的未知动力学。我们验证其在模拟和实验中，用于携带悬浮负载的无人驾驶航空车辆（无人机）和在模拟中，用于异构机器人的共同集。

Continuous Action Reinforcement Learning for Control-Affine Systems with Unknown Dynamics

摘要

著录项

相关主题

期刊订阅