首页>
外国专利>
EFFICIENT AND MORE ADVANCED IMPLEMENTATION OF RING-ALLREDUCE ALGORITHM FOR DISTRIBUTED PARALLEL DEEP LEARNING
EFFICIENT AND MORE ADVANCED IMPLEMENTATION OF RING-ALLREDUCE ALGORITHM FOR DISTRIBUTED PARALLEL DEEP LEARNING
展开▼
机译:用于分布式平行深度学习的戒指算法的高效和更先进的实现
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present disclosure provides a method for syncing data of a computing task across a plurality of groups of computing nodes, each group comprising a set of computing nodes A-D, a set of intra-group interconnects that communicatively couple computing node A with computing nodes B and C and computing node D with computing nodes B and C, and a set of inter-group interconnects that communicatively couple a computing node A of a first group of the plurality of groups with a computing node A of a second group neighboring the first group, a computing node B of the first group with a computing node B of the second group, a computing node C of the first group with the computing node C of the second group, and a computing node D of the first group with a computing node D of the second group, the method comprising: syncing across a first dimension of computing nodes using a first set of ring connections, wherein the first set of ring connections are formed using inter-group and intra-group interconnects that communicatively couple the computing nodes along the first dimension; and broadcasting synced data across a second dimension of computing nodes using a second ring connection.
展开▼