首页> 外文期刊>Internet of Things Journal, IEEE >Parallel Reinforcement Learning With Minimal Communication Overhead for IoT Environments
【24h】

Parallel Reinforcement Learning With Minimal Communication Overhead for IoT Environments

机译:具有最小通信开销的并行增强学习,可用于IOT环境

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Many Internet of Things (IoT) applications require a distributed architecture for decision making either because of a lack of a centralized system, failure-prone connectivity to a centralized system or because the imposed latency to contact such a system is too high for real-time applications. Often, these IoT applications fall in the domain of reinforcement learning (RL), e.g., autonomous robot navigation in smart factories and traffic signal control in smart cities. However, RL-based applications require a long learning time. To overcome this limitation and scale with the number of agents, parallel RL (PRL) algorithms run multiple RL agents in parallel and on distributed environments. However, deploying PRL algorithms in such environments entails a communication overhead that increases the (actual) execution time. The state-of-the-art PRL algorithms are designed for reducing the learning time while assuming no (or limited) communication overhead. In this article, we present a novel partitioning algorithm that minimizes the communication overhead in PRL running on IoT environments. To the best of our knowledge, this is the first work that focuses on solving the communication overhead of distributing PRL algorithms without requiring any a priori knowledge about the structure of the problem. The proposed algorithm intelligently combines a dynamic state partitioning strategy, which exploits the agent's exploration capabilities to build partition knowledge while learning, with an efficient mapping of agents to partitions, which reduces the communication among agents. Performance evaluations show that the proposed algorithm can achieve almost no communication among PRL agents at the converged state.
机译:许多东西(物联网)应用程序(IoT)应用程序需要一个分布式架构,因为缺乏集中式系统,故障易于连接到集中式系统,或者因为施加的延迟与此类系统的施加延迟太高而实时太高应用程序。通常,这些物联网应用落入强化学习(RL)的领域,例如智能城市中的智能工厂和交通信号控制中的自主机器人导航。但是,基于RL的应用需要很长的学习时间。为了克服这种限制和缩放与代理的数量,并行RL(PRL)算法并行运行多个RL代理,并在分布式环境上运行多个RL代理。但是,在这种环境中部署PRL算法需要增加(实际)执行时间的通信开销。最先进的PRL算法被设计用于减少在假设没有(或有限)的通信开销时的学习时间。在本文中,我们提出了一种新的分区算法,可以最大限度地减少在IOT环境上运行的PRL的通信开销。据我们所知,这是第一个专注于解决分发PRL算法的通信开销的第一项工作,而无需任何关于问题结构的先验知识。该算法智能地结合了动态状态分区策略,该策略利用代理的探测能力来构建分区知识,同时学习,代理到分区的有效映射,从而降低了代理之间的通信。性能评估表明,该算法可以在融合状态下实现PRL代理之间的几乎没有通信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号