Distributing Rewards by Strategic Knowledge based on Nash-Q Learning

机译：基于NASH-Q学习的战略知识分发奖励

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this investigation, we examine collaboration approach to reward distribution in repeated general-sum stochastic games by multiple game players in terms of position and rewards. There have been several investigation of reward distribution discussed so far, and reinforcement has been considered useful since no knowledge is needed in advanced and better decision can be extracted while learning. Among others, Q-learning has been paid much attention under single agent environment. However, under multi-agent environment, we don't have sharp targets to this problem, what is the most optimal principle? In this work, we discuss how to distribute reward thoroughly by considering as general stochastic games based on theory of games. That is, we introduce Nash-Q approach which combines Nash equilibrium with Q-learning. We show the new approach provides us with new strategic solution. We discuss some experiments of rather complicated games (game of life) to see the usefulness of the approach.

机译：在这项调查中，我们在位置和奖励方面，研究了多个游戏者在多个游戏玩家中奖励分配的合作方法。到目前为止讨论了奖励分布的几次调查，而且被认为是有用的，因为在学习时无法提取先进的和更好的决定所需的知识。在其他人中，Q-Learning在单一代理环境下得到了很多关注。但是，在多代理环境下，我们没有尖锐的目标对此问题，最佳原则是什么？在这项工作中，我们讨论如何通过考虑基于游戏理论的一般随机游戏来彻底分发奖励。也就是说，我们介绍了NASH-Q方法，将纳什均衡与Q-Leyaly结合起来。我们展示了新方法为我们提供了新的战略解决方案。我们讨论了一些相当复杂的游戏（生活游戏）的实验，以了解该方法的有用性。

著录项

来源
《International Conference on Applications of Digital Information and Web Technologies》|2008年||共6页
会议地点
作者
Kazuo IGOSHI; Takao MIURA; Isamu SHIOYA;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词

相似文献

外文文献
中文文献
专利

1. Nash-Q learning-based collaborative dispatch strategy for interconnected power systems [J] . Ran Li, Yi Han, Tao Ma, 全球能源互联网：英文版 . 2020,第003期

机译：基于Nash-Q学习的互联电力系统协同u200bu200b调度策略
2. Collaborative Web-Based System for Knowledge Transfer to Distributed Groups of Users Within Strategic Noise Mapping Domain [J] . Martin Dqbrowski International journal of distributed systems and technologies . 2013,第4期

机译：基于Web的协作系统，用于在战略噪声映射域内将知识转移到分布式用户组
3. Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer [J] . Liu Xiongqing, Jin Yan Artificial Intelligence for Engineering Design, Analysis & Manufacturing . 2020,第2期

机译：基于加强学习的碰撞避免：奖励功能的影响和知识转移
4. Distributing Rewards by Strategic Knowledge based on Nash-Q Learning [C] . Kazuo IGOSHI, Takao MIURA, Isamu SHIOYA International Conference on Applications of Digital Information and Web Technologies . 2008

机译：基于NASH-Q学习的战略知识分发奖励
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Neuromodulatory adaptive combination of correlation-based learning in cerebellum and reward-based learning in basal ganglia for goal-directed behavior control [O] . Sakyasingha Dasgupta, Florentin Wörgötter, Poramate Manoonpong 2014

机译：小脑相关学习与基底神经节奖励学习的神经调节自适应结合用于目标定向行为控制
7. Chapter: 'The Blockchain and Kudos: A Distributed System for Educational Record, Reputation and Reward' from book: Adaptive and Adaptable Learning: 11th European Conference on Technology Enhanced Learning, EC-TEL 2016, Lyon, France, September 13-16, 2016, Proceedings [O] . Mike Sharples, John Domingue 2016

机译：章节：``区块链和荣誉：教育记录，声誉和奖励的分布式系统''，摘自书：适应性和适应性学习：第11届欧洲技术增强学习大会，EC-TEL 2016，法国里昂，2016年9月13日至16日，会议记录

Distributing Rewards by Strategic Knowledge based on Nash-Q Learning

摘要

著录项

相似文献

相关主题

期刊订阅