...
首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Efficient algorithms for block-cyclic array redistribution between processor sets
【24h】

Efficient algorithms for block-cyclic array redistribution between processor sets

机译:处理器集之间的块循环数组重新分配的高效算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Run-time array redistribution is necessary to enhance the performance of parallel programs on distributed memory supercomputers. In this paper, we present an efficient algorithm for array redistribution from cyclic(x) on P processors to cyclic(Kx) on Q processors. The algorithm reduces the overall time for communication by considering the data transfer, communication schedule, and index computation costs. The proposed algorithm is based on a generalized circulant matrix formalism. Our algorithm generates a schedule that minimizes the number of communication steps and eliminates node contention in each communication step. The network bandwidth is fully utilized by ensuring that equal-sized messages are transferred in each communication step. Furthermore, the time to compute the schedule and the index sets is significantly smaller. It takes O(max(P, Q)) time and is less than 1 percent of the data transfer time. In comparison, the schedule computation time using the state-of-the-art scheme (which is based on the bipartite matching scheme) is 10 to 50 percent of the data transfer time for similar problem sizes. Therefore, our proposed algorithm is suitable for run-time array redistribution. To evaluate the performance of our scheme, we have implemented the algorithm using C and MPI on an IBM SP2. Results show that our algorithm performs better than the previous algorithms with respect to the total redistribution time, which includes the time for data transfer, schedule, and index computation.
机译:运行时数组重新分发对于增强分布式内存超级计算机上并行程序的性能是必需的。在本文中,我们提出了一种从P处理器上的循环(x)到Q处理器上的循环(Kx)的阵列重新分配的有效算法。该算法通过考虑数据传输,通信调度和索引计算成本来减少通信的总时间。所提出的算法基于广义循环矩阵形式。我们的算法生成了一个调度表,该调度表将通信步骤的数量减至最少,并消除了每个通信步骤中的节点争用。通过确保在每个通信步骤中传输大小相等的消息,可以充分利用网络带宽。此外,计算计划表和索引集的时间明显更少。它需要O(max(P,Q))时间,并且小于数据传输时间的1%。相比之下,对于类似的问题大小,使用最新方案(基于二元匹配方案)的调度计算时间为数据传输时间的10%到50%。因此,我们提出的算法适合于运行时数组重新分配。为了评估我们方案的性能,我们在IBM SP2上使用C和MPI实现了该算法。结果表明,在总重新分配时间(包括数据传输,调度和索引计算的时间)方面,我们的算法比以前的算法性能更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号