首页> 外文会议>RoboCup International Symposium >Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates

【24h】

Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates

机译：利用自我和队友的估计州价值有效的行为学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning applications to real robots in multi-agent dynamic environments are limited because of huge exploration space and enormously long learning time. One of the typical examples is a case of RoboCup competitions since other agents and their behavior easily cause state and action space explosion. This paper presents a method that utilizes state value functions of macro actions to explore appropriate behavior efficiently in a multi-agent environment by which the learning agent can acquire cooperative behavior with its teammates and competitive ones against its opponents. The key ideas are as follows. First, the agent learns a few macro actions and the state value functions based on reinforcement learning beforehand. Second, an appropriate initial controller for learning cooperative behavior is generated based on the state value functions. The initial controller utilizes the state values of the macro actions so that the learner tends to select a good macro action and not select useless ones. By combination of the ideas and a two-layer hierarchical system, the proposed method shows better performance during the learning than conventional methods. This paper shows a case study of 4 (defense team) on 5 (offense team) game task, and the learning agent (a passer of the offense team) successfully acquired the teamwork plays (pass and shoot) within shorter learning time.

机译：由于巨大的探险空间和大量学习时间，对多代理动态环境中的实际机器人的加固学习应用是有限的。其中一个典型的例子是自其他代理商和他们的行为以来的Robocup比赛的情况容易导致状态和行动空间爆炸。本文介绍了一种方法，该方法利用宏动作的状态值函数，以便在学习代理可以与其队友和对手竞争的竞争者获得合作行为的多种代理环境中有效地探索适当的行为。关键的想法如下。首先，代理学习了一些宏动作和基于强化学习的状态价值函数。其次，基于状态值函数生成用于学习协作行为的适当初始控制器。初始控制器利用宏操作的状态值，以便学习者倾向于选择良好的宏动作，而不是选择无用的操作。通过思想和双层分层系统的组合，所提出的方法在学习期间显示出比传统方法更好的性能。本文展示了4名（防御团队）的案例研究5（冒犯团队）游戏任务，而学习代理（违法团队的传球商）成功获得了在更短的学习时间内的团队合作播放（通过和拍摄）。

著录项

来源
《RoboCup International Symposium》|2010年||共11页
会议地点
作者
Kouki Shimada; Yasutake Takahashi; Minoru Asada;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Mobile Learning Utilization: A Proposed Model to Investigate an Important Predictors of Mobile Learning Utilization and Measure The Role of Behavioral Intention As A Mediator Variable [J] . Shaimarsquo, a Mohammad Al Tabib, Shaffe Mohd Daud, Journal of applied sciences research . 2016,第2016期

机译：移动学习利用率：一种提议的模型，用于研究移动学习利用率的重要预测因素并衡量行为意图作为中介变量的作用
2. Mobile Learning Utilization: A Proposed Model to Investigate an Important Predictors of Mobile Learning Utilization and Measure The Role of Behavioral Intention As A Mediator Variable [J] . Shaimarsquo, a Mohammad Al Tabib, Shaffe Mohd Daud, Journal of applied sciences research . 2016,第2016期

机译：移动学习利用率：一种提议的模型，用于研究移动学习利用率的重要预测因素并衡量行为意图作为中介变量的作用
3. Directing Students to Meta-Recognition Utilizing Good Learning Behavior in E-Learning [J] . Yoshiharu Yamauchi, Yusuke Kajiwara, Hiromitsu Shimakawa, International Journal of Web Engineering . 2018,第1期

机译：在电子学习中利用良好的学习行为引导学生进行元认知
4. Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates [C] . Kouki Shimada, Yasutake Takahashi, Minoru Asada RoboCup International Symposium . 2010

机译：利用自我和队友的估计州价值有效的行为学习
5. A Data-Parallel Approach for Efficient Resource Utilization in Distributed Serverless Deep Learning [D] . Assogba, Kevin Tunder Elom. 2020

机译：分布式无服务深度学习中有效资源利用的数据并行方法
6. Predicting Health Care Utilization After Behavioral Health Referral Using Natural Language Processing and Machine Learning [O] . Nathaniel Roysden, Adam Wright 2015

机译：使用自然语言处理和机器学习预测行为健康推荐后的医疗保健利用率
7. Efficient learning of reactive robot behaviors with a Neural-Q_learning approach [O] . Carreras Pérez, Marc, Ridao Rodríguez, Pere, Batlle i Grabulosa, Joan, 2002

机译：使用Neural-Q_learning方法有效学习反应式机器人行为
8. Detecting Primary Signals for Efficient Utilization of Spectrum Using Q- Learning (POSTPRINT) [R] . Reddy, Y. 2010

机译：使用Q学习检测主要信号以有效利用频谱（pOsTpRINT）

Efficient Behavior Learning by Utilizing Estimated State Value of Self and Teammates

摘要

著录项

相似文献

相关主题

期刊订阅