Distributed deep reinforcement learning method using profit sharing for learning acceleration

Kodama Naoki; Harada Taku; Miyazaki Kazuteru

首页> 外文期刊>IEEJ Transactions on Electrical and Electronic Engineering >Distributed deep reinforcement learning method using profit sharing for learning acceleration

【24h】

Distributed deep reinforcement learning method using profit sharing for learning acceleration

机译：分布式深入强化学习方法使用利润共享进行学习加速

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Profit Sharing (PS), a reinforcement learning method that strongly reinforces successful experiences, has been shown to contribute to the improvement of learning speed when combined with a deep Q-network (DQN). We expect a further improvement in learning speed by integrating PS-based learning and Ape-X DQN that has state-of-the-art learning speed instead of the DQN. However, PS-based learning does not use replay memory. In contrast, the Ape-X DQN requires the use of replay memory because the exploration of the environment for collecting experiences and network training are performed asynchronously. In this study, we propose Learning-accelerated Ape-X, which integrates the Ape-X DQN and PS-based learning with some improvements including the use of replay memory. We show through numerical experiments that the proposed method improves the scores in Atari 2600 video games in a shorter time than the Ape-X DQN. (c) 2020 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

机译：利润分享（PS）是一种强烈增强成功体验的强化学习方法，已被证明可以与深Q-NETWORK（DQN）结合使用，从而有助于提高学习速度。我们希望通过整合具有最先进的学习速度而不是DQN的APE-X DQN来进一步提高学习速度。但是，基于PS的学习不使用重播记忆。相比之下，APE-X DQN需要使用重播记忆，因为对环境的收集体验和网络培训的探索是异步进行的。在这项研究中，我们提出了学习加速的APE-X，该APE-X将APE-X DQN和基于PS的学习集成在一起，并进行了一些改进，包括使用重播记忆。我们通过数值实验表明，所提出的方法在较短的时间内比APE-X DQN在较短的时间内提高了Atari 2600视频游戏的分数。（c）2020年日本电气工程师研究所。由John Wiley＆Sons，Inc。出版

著录项

来源
《IEEJ Transactions on Electrical and Electronic Engineering》 |2020年第8期|1188-1196|共9页
作者
Kodama Naoki; Harada Taku; Miyazaki Kazuteru;
展开▼
作者单位

Tokyo Univ Sci, Grad Sch Sci & Technol, Dept Ind Adm, 2641 Yamazaki, Noda, Chiba 2788510, Japan;

Natl Inst Acad Degrees & Qual Enhancement Higher, Res Dept, 1-29-1 Gakuen Nishimachi, Kodaira, Tokyo 1878587, Japan;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类 E33;
关键词
Video Games; Distance learning; LearningMemorylearning (artificial intelligence)MemoryAsynchronous;

机译：视频游戏;远程学习;学习成员学习（人工智能）记忆合格;

相似文献

外文文献
中文文献
专利

1. Seen and HeardSeen and Heard (e‐Learning Course and Supplementary Training Materials on Building Awareness of Child Sexual Abuse and Exploitation) by the Department of Health and the Children's Society, 2016. Available free: http://learning.seenandheard.org.ukhttp://learning.seenandheard.org.uk [J] . Eldridge Hilary Child abuse review ejournal of the British Association for the Study and Prevention of Child Abuse and Neglect . 2018,第2期

机译：看到并听到了看到并听到了（电子学习课程和补充培训材料，建立健康和儿童协会的儿童社会的对儿童性虐待和剥削的认识）。免费： http://learning.seenandheard.org.uk http://learning.seenandheard.org.uk.

2. Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments [J] . Marks Markus, Jin Qiuhan, Sturman Olivervon Ziegler LukasKollmorgen Seppvon der Behrens WolfgerMante ValerioBohacek JohannesYanik Mehmet Fatih Nature Machine Intelligence . 2022,第4期

机译：Deep-learning-based识别、跟踪、姿势估计和行为的分类灵长类动物和老鼠在复杂的交互环境

3. Deep Reinforcement Learning Based Dynamic Reputation Policy in 5G Based Vehicular Communication Networks [J] . Gyawali Sohan, Qian Yi, Hu Rose Qingyang IEEE Transactions on Vehicular Technology . 2021,第6期

机译：基于5G基于5G的车辆通信网络的基于Deep Contive Learning的动态信誉策略

4. Optimization of Home Energy Management System with Incentives Using Deep Reinforcement Learning [C] . Junjie Sun, Hirohisa Aki 電気学会;電子·情報·システム部門大会 . -1

机译：Optimization of Home Energy Management System with Incentives Using Deep Reinforcement Learning

5. Learning Task-Oriented Dialog with Neural Network Methods =基于神经网络的任务型对话学习 [D] . Liu, Bing. 2018

机译：Learning Task-Oriented Dialog with Neural Network Methods =基于神经网络的任务型对话学习

6. Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach DeepSnap–Deep Learning [O] . Yasunari Matsuzaka, Takuomi Hosaka, Anna Ogaito, 2020

机译：新型QSAR方法DeepSnap-Deep Learning的芳烃受体活化预测模型

7. M-learning: M-learning Applications, Students Input for M-learning in Science Instruction [O] . Vehbi Aytekin Sanalan, Ozkan Yılmaz 2011

机译：M-learning：M-learning应用程序，学生在科学教学中进行M-learning的输入

8. Integration of Experience API Into CDET's E-Learning. [R] . MacAloney, C. C. 2016

机译：将Experience apI集成到CDET的E-Learning中。

1. A Distributed Algorithm for Parallel Multi-task Allocation Based on Profit Sharing Learning [J] . SU Zhao-Pin ,JIANG Jian-Guo ,LIANG Chang-Yong . 自动化学报 . 2011,第7期

2. Straight-Path Following and Formation Control of USVs Using Distributed Deep Reinforcement Learning and Adaptive Neural Network [J] . Zhengqing Han ,Yintao Wang ,Qi Sun . 自动化学报：英文版 . 2023,第2期

3. Distributed Multi-Cell Multi-User MISO Downlink Beamforming via Deep Reinforcement Learning [J] . JIA Haonan ,HE Zhenqing ,TAN Wanlong . 中兴通讯技术：英文版 . 2022,第4期

4. Navigation Method Based on Improved Rapid Exploration Random Tree Star-Smart(RRT^(*)-Smart) and Deep Reinforcement Learning [J] . 张珏 ,李祥健 ,刘肖燕 . 东华大学学报：英文版 . 2022,第5期

5. A Joint Power and Bandwidth Allocation Method Based on Deep Reinforcement Learning for V2V Communications in 5G [J] . Xin Hu ,Sujie Xu ,Libing Wang . 中国通信：英文版 . 2021,第7期

6. Theoretical and Empirical Study on Knowledge Sharing Behaviors, knowledge learning Orientation and Social Exchange of Knowledge [C] . Jing Pan ,Yang Sui . The Conference on Web Based Business Management (WBM 2010)(2010年基于互联网的商业管理学术会议) . 2010

7. Multi-Agent Reinforcement Learning Through Weighted Experience Sharing [A] . 阿卜杜拉 . 2012

1. 一种基于Deep Q-Learning的无线自组网设备路由方法 [P] . 中国专利： CN115643623A . 2023-01-24

2. 一种基于Deep Q-Learning的集群区域覆盖方法 [P] . 中国专利： CN114326749A . 2022-04-12

3. T-Learning oriented Content framework using t-Learning characteristic APIs and interaction data and Workflow method for inside of STB with t-Learning middle ware [P] . 外国专利： KR100945249B1 . 2010-03-04

机译：使用t-Learning中间件的t-Learning特色API和交互数据以及工作流方法，面向T-Learning的内容框架

4. Memory Device With Reinforcement Learning With Q-Learning Acceleration [P] . 外国专利： US2022066697A1 . 2022-03-03

机译：具有Q-Learning加速的加固学习的记忆装置

5. A distributed e-learning system [P] . 外国专利： DE202021106920U8 . 2022-03-17

机译：分布式e-learning系统

相关主题

Distributed deep reinforcement learning method using profit sharing for learning acceleration

摘要

著录项

相似文献

相关主题

期刊订阅