Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic

机译：使用仲裁预测演员 - 评论家的多步目标混合习惯和规划

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Internal models are important when agents make decisions based on predictions of future states and their utilities. However, using internal models for planning can be time consuming. Therefore, it can be useful to use a habitual system for repetitive tasks that can be executed faster and with reduced algorithmic resources. Current evidence suggests that the brain uses both control systems, planning and habitual systems for behavioural control, which then requires an arbitration between these two systems. In our previous work [1], we proposed an Arbitrated Predictive Actor-Critic (APAC), which is a neural architecture demonstrating cooperative mechanisms of planning and habitual control systems for one step mapping. The present study tests the ability of such a model to control a simulated two-joints robotic arm during multiple reaching tasks with movement limitations that require multiple steps to solve the task. Our results show that APAC can learn the multi-step learning under various conditions. Interestingly, the APAC tends to shift from planning to habits by taking actions predicted by a habitual controller over the training time.

机译：当代理人根据未来国家及其公用事业的预测做出决策时，内部模型很重要。但是，使用内部模型进行规划可能会耗时。因此，使用习惯性系统对于可以更快地执行并且具有减少的算法资源，它可以是有用的。目前的证据表明，大脑使用控制系统，规划和习惯系统进行行为控制，然后需要这两个系统之间需要仲裁。在我们以前的工作[1]中，我们提出了一项仲裁演员 - 评论家（APAC），这是一个神经结构，展示了一步绘图的规划和习惯控制系统的合作机制。目前的研究测试了这种模型在多个达到的任务期间控制模拟的双关节机器人机器人的能力，其需要多个步骤来解决任务。我们的结果表明，APAC可以在各种条件下学习多步学习。有趣的是，APAC往往会通过在训练时间采取习惯控制器预测的行动来转向习惯。

著录项

来源
《International Joint Conference on Neural Networks》|2018年|1-705p|共8页
会议地点
作者
Farzaneh S. Fard; Thomas P. Trappenberg;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词
Planning; Computational modeling; Predictive models; Task analysis; Manipulators; Inverse problems;

机译：规划;计算建模;预测模型;任务分析;操纵器;逆问题;

相似文献

外文文献
中文文献
专利

1. 基于NARX神经网络的短期多步太阳辐射预测的混合分解强化模型 [J] . 黄家豪, 刘辉中南大学学报（英文版） . 2021,第002期
2. Cooperative traffic signal control using Multi-step return and Off-policy Asynchronous Advantage Actor-Critic Graph algorithm [J] . Yang Shantian, Yang Bo, Wong Hau-San, Knowledge-Based Systems . 2019,第Nova1期

机译：使用多步返回和偏离策略异步优势Actor-Critic图算法的交通信号协同控制
3. An Adaptive Actor-critic Algorithm with Multi-step Simulated Experiences for Controlling Nonholonomic Mobile Robots [J] . Rafiuddin Syam, Keigo Watanabe, Kiyotaka Izumi Soft Computing . 2007,第1期

机译：具有多步模拟经验的自适应Actor-Crit算法，用于控制非完整移动机器人
4. An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots [J] . Syam R, Watanabe K, Izumi K Soft computing: A fusion of foundations, methodologies and applications . 2007,第1期

机译：具有多步模拟经验的自适应角色批评算法，用于控制非完整移动机器人
5. Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic [C] . Farzaneh S. Fard, Thomas P. Trappenberg International Joint Conference on Neural Networks . 2018

机译：混合习惯和使用仲裁预测性主演-批判性计划多目标达成
6. Understanding media habits: The role of habit in the theory of planned behavior [D] . Lange, Ryan. 2009

机译：了解媒体习惯：习惯在计划行为理论中的作用
7. Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning [O] . Shan Zhong, Quan Liu, QiMing Fu 2016

机译：具有分层模型学习和计划的高效Actor-Critic算法
8. An adaptive actor-critic algorithm with multi-step simulated experiences for controlling nonholonomic mobile robots [O] . Syam Rafiuddin 2007

机译：具有多步模拟经验的自适应角色批评算法，用于控制非完整移动机器人

Mixing Habits and Planning for Multi-Step Target Reaching Using Arbitrated Predictive Actor-Critic

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅