A shaped-q learning for multi-agents systems

机译：用于多代理系统的Q学习

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes an architecture where each agent maintains a cooperative tendency table (CTT). In the process of learning, agents need not communicate with each other but observe partners' actions while taking actions. If one of the agents meets a bad situation, such as bumping onto obstacles after taking an action. In such a case, agents will receive a bad reward from the environment. Similarly, if one agent reaches a goal after taking an action, agents obtain a good reward instead. Rewards are used to update the policy and to adjust cooperative tendency values which are recorded in the individual CTT. When an agent perceives a state, the corresponding cooperative tendency value, and the Q-value are merged to a Shaped-Q value. The action with maximal Shaped-Q value in this state will be selected. After agents take actions and receive a reward, agents update their own CTTs. Therefore, agents could use this method to reach a consensus more quickly to enhance learning efficiency and reduce the occurrence of stagnation. The simulation results demonstrate that the proposed method can speed up the learning process and solve the problem of huge memory space consumption to some degrees. As well, it can make agents complete the task together more efficiently.

机译：本文提出了一种架构，其中每个代理维持合作趋势表（CTT）。在学习过程中，代理人不需要彼此沟通，而是在采取行动时遵守合作伙伴的行动。如果其中一个代理人遇到了糟糕的情况，例如在采取行动后碰到障碍。在这种情况下，代理商将获得对环境的不良奖励。同样，如果一个代理人在采取行动后达到目标，那么代理商就获得了良好的奖励。奖励用于更新策略并调整记录在单个CTT中的合作趋势值。当代理人感知状态时，相应的协作趋势值和Q值被合并为Q值。将选择具有最大形状-Q值的动作。代理采取行动并获得奖励后，代理商更新自己的CTT。因此，代理商可以使用这种方法更快地达成共识，以提高学习效率，减少停滞的发生。仿真结果表明，该方法可以加快学习过程，并解决巨大的内存空间消耗问题到某些程度。同样，它可以使代理商更有效地完成任务。

著录项

来源
《International Conference on Systems, Man, and Cybernetics》|2017年|1 v.|共4页
会议地点
作者
Kao-Shing Hwang; Wei-Cheng Jiang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Mathematical model; Multi-agent systems; Learning (artificial intelligence); Simulation; Torque; Electrical engineering;

机译：数学模型;多种子体系统;学习（人工智能）;模拟;扭矩;电气工程;

相似文献

外文文献
中文文献
专利

1. A unified framework for reinforcement learning, co-learning and meta-learning how to coordinate in collaborative multi-agent systems [J] . Predrag T. To?i?, Ricardo Vilalta Procedia Computer Science . 2010,第1期

机译：强化学习，共同学习和元学习的统一框架，如何在协作式多智能体系统中进行协调
2. Iterative learning control approach for a kind of heterogeneous multi-agent systems with distributed initial state learning [J] . Li Jinsha, Li Junmin Applied mathematics and computation . 2015,第Null期

机译：一类具有分布式初始状态学习的异构多智能体系统的迭代学习控制方法
3. Adaptive fuzzy iterative learning control with initial-state learning for coordination control of leader-following multi-agent systems [J] . Junmin Li, Jinsha Li Fuzzy sets and systems . 2014,第auga1期

机译：具有初始状态学习的自适应模糊迭代学习控制，用于领导者跟随多主体系统的协调控制
4. A shaped-q learning for multi-agents systems [C] . Kao-Shing Hwang, Wei-Cheng Jiang 2017 IEEE International Conference on Systems, Man, and Cybernetics . 2017

机译：多主体系统的Shape-q学习
5. Collective machine learning: Team learning and classification in multi-agent systems. [D] . Gifford, Christopher M. 2009

机译：集体机器学习：多主体系统中的团队学习和分类。
6. Discovering Hidden Mental States in Open Multi-Agent Systems by Leveraging Multi-Protocol Regularities with Machine Learning [O] . Emilio Serrano, Javier Bajo 2020

机译：通过利用机器学习利用多协议规律来发现开放式多种代理系统中的隐藏心理状态
7. A unified framework for reinforcement learning, co-learning and meta-learning how to coordinate in collaborative multi-agent systems [O] . Tošić Predrag T., Vilalta Ricardo 2010

机译：强化学习，共同学习和元学习的统一框架，如何在协作式多智能体系统中进行协调

A shaped-q learning for multi-agents systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅