首页> 外文会议>2011 17th IEEE International Conference on Parallel and Distributed Systems >Reflex Barrier: A Scalable Network-Based Synchronization Barrier
【24h】

Reflex Barrier: A Scalable Network-Based Synchronization Barrier

机译:反射屏障:可扩展的基于网络的同步屏障

获取原文

摘要

High-performance computing is witnessing the proliferation of multi-core processors in parallel architectures, and the trend is expected to increase further with the emerging many-core technology, leading to hundreds of processing cores within each compute node in the near future. Along side with this trend, it is also clear that total number of cores within the whole system is increasing. To be able to harvest the fruits of this massive parallelism, inter-process synchronization and communication should be as lightweight as they can be, and should be relying on as limited involvement as possible of the participating processors/cores. The synchronization algorithms that target shared memory processors are expected not to be able to scale on many-cores as they rely on atomics, locks, and/or cache coherence protocols, which all should be very costly operations on many-cores. In the same time, some many core architectures provide user space networks on chip (NoCs) that operate similar to regular networks. In this paper, we are introducing the Reflex barrier, a new synchronization barrier algorithm that relies on fundamental networking concepts. As the barrier relies on the characteristics of the network, it requires very little intervention from the participating processors/cores. The algorithm can also be implemented as split phase, which furnish an opportunity to reduce the synchronization cost. We implemented the algorithm using Unified Parallel C (UPC), MPI and pThreads. We tested our implementation on TILE64, a 64-core processor. The performance of the Reflex barrier is also analyzed and compared to other algorithms using performance models.
机译:高性能计算正在见证并行体系结构中多核处理器的激增,并且随着新兴的多核技术的出现,趋势有望进一步增加,并在不久的将来在每个计算节点内形成数百个处理核心。伴随着这种趋势,很明显,整个系统内的内核总数正在增加。为了能够收获这种大规模并行处理的成果,进程间同步和通信应尽可能轻巧,并且应尽可能地依赖参与的处理器/内核的有限参与。预期以共享内存处理器为目标的同步算法将无法在多核上扩展,因为它们依赖于原子,锁和/或高速缓存一致性协议,这些协议在多核上的操作都将是非常昂贵的。同时,许多核心架构都提供了类似于常规网络的片上用户空间网络(NoC)。在本文中,我们将介绍Reflex屏障,这是一种新的同步屏障算法,它依赖于基本的网络概念。由于壁垒取决于网络的特性,因此它几乎不需要参与的处理器/核心的干预。该算法还可以实现为分离阶段,这为降低同步成本提供了机会。我们使用统一并行C(UPC),MPI和pThreads实现了该算法。我们在64核处理器TILE64上测试了我们的实现。还对反射屏障的性能进行了分析,并与使用性能模型的其他算法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号