首页> 外文会议>Intelligent Autonomous Systems 9(IAS-9) >Incremental Purposive Behavior Acquisition based on Modular Learning System
【24h】

Incremental Purposive Behavior Acquisition based on Modular Learning System

机译:基于模块化学习系统的增量目的性行为习得

获取原文
获取原文并翻译 | 示例

摘要

A simple and straightforward application of reinforcement learning methods to real robot tasks is considerably difficult due to a huge exploration space that easily scales up exponentially since recent robots tend to have many kinds of sensors. One of the potential solutions might be application of so-called "mixture of experts" proposed by Jacobs et al.[1]; it decomposes a whole state space to a number of areas so that each expert module can produce good performance in the assigned small area. This idea is very general and has a wide range of applications, however, we have to consider how to decompose the space to a number of small regions, assign each of them to a learning module or an expert, and define a goal for each of them. In order to cope with the issue, this paper presents a method of self task decomposition for modular learning system based on self-interpretation of instructions given by a coach. Unlike the conventional approaches, the system decomposes a long-term task into short-term subtasks so that one learning module with limited computational resources can acquire a purposive behavior for one of these subtasks. Since instructions are given from a viewpoint of coach who has no idea how the system learns, they are interpreted by the learner to find the candidates for subgoals. Finally, the top layer of the hierarchical reinforcement learning system coordinates the lower learning modules to accomplish the whole task. The method is applied to a simple soccer situation in the context of RoboCup.
机译:将增强学习方法简单,直接地应用于实际机器人任务非常困难,因为巨大的探索空间很容易按指数比例扩大,因为最近的机器人往往具有多种传感器。潜在的解决方案之一可能是Jacobs等人[1]提出的所谓“专家混合”的应用。它将整个状态空间分解为多个区域,以便每个专家模块都能在分配的小区域内产生良好的性能。这个想法非常笼统,具有广泛的应用范围,但是,我们必须考虑如何将空间分解为多个小区域,将每个区域分配给学习模块或专家,并为每个区域定义目标他们。为了解决这个问题,本文提出了一种基于教练对指令的自我解释的模块化学习系统的自我任务分解方法。与常规方法不同,该系统将长期任务分解为短期子任务,以便计算资源有限的一个学习模块可以针对这些子任务之一获得有目的的行为。由于从不知道系统如何学习的教练的角度给出指令,因此学习者会对其进行解释以找到子目标的候选人。最后,分层强化学习系统的顶层协调较低的学习模块以完成整个任务。该方法适用于RoboCup上下文中的简单足球情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号