首页> 外文会议>IEEE symposium on parallel and distributed processing >The reconfigurable ring of processors: fine-grained tree-structured computations
【24h】

The reconfigurable ring of processors: fine-grained tree-structured computations

机译:可重新配置的处理器环:细粒度的树结构计算

获取原文

摘要

We study fine-grained parallel computation on a reconfigurable ring of processors (denoted by /spl Rscr//spl Rscr//spl Pscr/). The ring of processors is endowed with a very flexible reconfigurable bus. The bus has some number of lines, each having one-packet width, that can be configured to establish arbitrary point-to-point connections independently for each line. We assume that the /spl Rscr//spl Rscr//spl Pscr/s we study have been implemented so that the latency for transmitting messages is logarithmic in the number of processors the message passes over in transit. We present an algorithm that allows an N-processor /spl Rscr//spl Rscr//spl Pscr/ with w lines to perform the broadcast operation (and any "leveled tree-structured" computation like parallel prefix) in time at most (log/sup 2/ N/log w)+log N log log w. We prove that this algorithm's performance can be improved by at most a constant factor, both when the buswidth w is "small", so that the first term dominates, and when w is "large", so that the second term dominates. Further, we expose a fundamental, architecture-independent limitation imposed by the logarithmic communication latency model: we prove that for a broad range of parallel architectures, including any N-processor /spl Rscr//spl Rscr//spl Pscr/, any operation that requires one processor to receive information-directly or indirectly-from all other processors, requires time proportional to log N log log N.
机译:我们在可重新配置的处理器环上研究细粒度的并行计算(由/ SPL RSCR // SPL RSCR // SPL PSCR /)表示。处理器的环赋予了一个非常灵活的可重新配置总线。总线具有一些线路,每个线路具有单数据包宽度,可以配置为为每条线独立地建立任意点对点连接。我们假设我们研究的/ SPL rscr // spl rscr // spl pscr / s已经实现,以便在传输中传输的处理器数量的处理器的数量中,发送消息的延迟是对数。我们介绍了一种允许N-Processor / SPL RSCR // SPL rscr // spl / spl / spl / with的算法,以便最多及时执行广播操作(以及任何“级别的树结构”等于并行前缀)(日志/ sup 2 / n / log w)+ log n日志log w。我们证明了这种算法的性能可以通过最多的恒定因素来改善,当总线w为“小”时,都可以提高,使得第一个术语主导,并且当w是“大”时,使第二术语主导。此外,我们公开了对数通信延迟模型施加的基本,架构的独立限制:我们证明了广泛的并行架构,包括任何n-processor / spl rscr // spl rscr // spl pscr /,任何操作这需要一个处理器直接或间接地从所有其他处理器接收信息,需要时间与日志N日志日志N成比例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号