【24h】

A Simple Work-Optimal Broadcast Algorithm for Message-Passing Parallel Systems

机译:消息传递并行系统的一种简单的工作最优广播算法

获取原文
获取原文并翻译 | 示例

摘要

In this note we give a simple bandwidth- and latency optimal algorithm for the problem of broadcasting m units of data from a distinguished root processor to all p - 1 other processors in one-ported (hypercubic) message-passing systems. Assuming linear, uniform communication cost, the time for the broadcast to complete is O(m+log_2 p), more precisely no processor is involved in more than [log_2 p] communication operations (send, receive, and send-receive), and for any constant message size threshold b each processor (except the root) sends at most m - b′ + ([log_2 p] - l)b′ units of data, where b′ is determined by the smallest l ≤ [log_2 p] such that b′ = m/2~l ≤ b (the root sends 2m - b′ + ([log_2 p] - l)b′ units of data). Non-root processors receive m units of data. Building on known ideas, the salient features of the algorithm presented here is its simplicity of implementation, and smooth transition from latency to bandwidth dominated performance as data size m increases. The implementation performs very well in practice.
机译:在本说明中,我们给出了一个简单的带宽和等待时间最佳算法,用于解决在单端口(超立方)消息传递系统中将m个数据单元从专有根处理器广播到所有p-1个其他处理器的问题。假设线性,统一的通信成本,则完成广播的时间为O(m + log_2 p),更准确地说,没有任何处理器参与超过[log_2 p]个通信操作(发送,接收和发送-接收),并且对于任何恒定消息大小阈值b,每个处理器(根除外)最多发送m-b'+([log_2 p]-l)b'数据单位,其中b'由最小l≤[log_2 p]确定使得b'= m / 2〜l≤b(根发送2m-b'+([log_2 p]-l)b'数据单位)。非根处理器接收m个数据单元。在已知思想的基础上,此处介绍的算法的显着特征是其实现简单,并且随着数据大小m的增加,从延迟到带宽主导的性能平稳过渡。该实施在实践中表现很好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号