首页> 外文会议>International Joint Conference on Neural Networks >Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic
【24h】

Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic

机译:混合习惯和使用仲裁预测性主演-批判性计划多目标达成

获取原文

摘要

Internal models are important when agents make decisions based on predictions of future states and their utilities. However, using internal models for planning can be time consuming. Therefore, it can be useful to use a habitual system for repetitive tasks that can be executed faster and with reduced algorithmic resources. Current evidence suggests that the brain uses both control systems, planning and habitual systems for behavioural control, which then requires an arbitration between these two systems. In our previous work [1], we proposed an Arbitrated Predictive Actor-Critic (APAC), which is a neural architecture demonstrating cooperative mechanisms of planning and habitual control systems for one step mapping. The present study tests the ability of such a model to control a simulated two-joints robotic arm during multiple reaching tasks with movement limitations that require multiple steps to solve the task. Our results show that APAC can learn the multi-step learning under various conditions. Interestingly, the APAC tends to shift from planning to habits by taking actions predicted by a habitual controller over the training time.
机译:当代理人根据未来国家及其公用事业的预测做出决策时,内部模型很重要。但是,使用内部模型进行规划可能会耗时。因此,使用习惯性系统对于可以更快地执行并且具有减少的算法资源,它可以是有用的。目前的证据表明,大脑使用控制系统,规划和习惯系统进行行为控制,然后需要这两个系统之间需要仲裁。在我们以前的工作[1]中,我们提出了一项仲裁演员 - 评论家(APAC),这是一个神经结构,展示了一步绘图的规划和习惯控制系统的合作机制。目前的研究测试了这种模型在多个达到的任务期间控制模拟的双关节机器人机器人的能力,其需要多个步骤来解决任务。我们的结果表明,APAC可以在各种条件下学习多步学习。有趣的是,APAC往往会通过在训练时间采取习惯控制器预测的行动来转向习惯。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号