首页>
外国专利>
System with Hybrid Communication Strategy for Large-Scale Distributed Deep Learning
System with Hybrid Communication Strategy for Large-Scale Distributed Deep Learning
展开▼
机译:大规模分布式深度学习的混合通信策略系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer in a distributed computing system is disclosed. The computer includes: a graphics processing unit (GPU) memory; a central processing unit (CPU) memory comprising a Key-Value Store (KVS) module; an execution engine module configured to run a deep learning (DL) program to create a plurality of operator graph layers in the graphics processing unit memory; a client library module configured to create a GPU-CPU synchronization (GCS) module for each of the plurality of operator graph layers; a coordination service module configured to compute network cost of a first and a second communication scheme and select, based on the network cost, one of the first and second communication scheme for transmitting data associated with one of the plurality of operator graph layers from a corresponding GCS module.
展开▼