首页> 美国卫生研究院文献>Frontiers in Neurorobotics >A Novel Model for Arbitration Between Planning and Habitual Control Systems
【2h】

A Novel Model for Arbitration Between Planning and Habitual Control Systems

机译:规划与人为控制系统之间仲裁的新模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

It is well-established that human decision making and instrumental control uses multiple systems, some which use habitual action selection and some which require deliberate planning. Deliberate planning systems use predictions of action-outcomes using an internal model of the agent's environment, while habitual action selection systems learn to automate by repeating previously rewarded actions. Habitual control is computationally efficient but are not very flexible in changing environments. Conversely, deliberate planning may be computationally expensive, but flexible in dynamic environments. This paper proposes a general architecture comprising both control paradigms by introducing an arbitrator that controls which subsystem is used at any time. This system is implemented for a target-reaching task with a simulated two-joint robotic arm that comprises a supervised internal model and deep reinforcement learning. Through permutation of target-reaching conditions, we demonstrate that the proposed is capable of rapidly learning kinematics of the system without a priori knowledge, and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to exclusive deliberate planning with the internal model and exclusive habitual control instances of the model. The results show how such a model can harness the benefits of both systems, using fast decisions in reliable circumstances while optimizing performance in changing environments. In addition, the proposed model learns very fast. Finally, the system which includes internal models is able to reach the target under the visual occlusion, while the pure habitual system is unable to operate sufficiently under such conditions.
机译:公认的是,人类决策和工具控制使用多种系统,其中一些使用习惯性的行为选择,而某些则需要有计划的计划。精心计划的系统使用代理人环境的内部模型来预测行动结果,而惯常的行动选择系统则通过重复先前奖励的行动来学习自动化。习惯控制在计算上是有效的,但在变化的环境中不是很灵活。相反,有计划的计划在计算上可能会很昂贵,但在动态环境中却很灵活。本文通过介绍一个仲裁器来提出一个同时包含两个控制范式的通用体系结构,该仲裁器可随时控制使用哪个子系统。该系统通过模拟的两关节机械臂实现目标任务,该机械臂包括监督的内部模型和深度强化学习。通过对达到目标条件的置换,我们证明了该提议能够在没有先验知识的情况下快速学习系统的运动学,并且对于(A)改变环境奖励和运动学以及(B)遮挡视力具有鲁棒性。将仲裁员模型与内部模型和模型的惯常控制实例进行专门的故意计划比较。结果表明,这种模型如何利用两个系统的优势,在可靠的情况下使用快速决策,同时在不断变化的环境中优化性能。此外,提出的模型学习速度非常快。最终,包括内部模型的系统能够在视觉遮挡下到达目标,而纯习惯系统无法在这种条件下充分运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号