We present efficient algorithms for two all-to-all communication operations in message-passing systems: index (or all-to-all personalized communication) and concatenation (or all-to-all broadcast). We assume a model of a fully connected message-passing system, in which the performance of any point-to-point communication is independent of the sender-receiver pair. We also assume that each processor has k ≥ 1 ports, through which it can send and receive k messages in every communication round. The complexity measures we use are independent of the particular system topology and are based on the communication start-up time, and on the communication bandwidth. ududIn the index operation among n processors, initially, each processor has n blocks of data, and the goal is to exchange the ith block of processor j with the jth block of processor i. We present a class of index algorithms that is designed for all values of n and that features a trade-off between the communication start-up time and the data transfer time. This class of algorithms includes two special cases: an algorithm that is optimal with respect to the measure of the start-up time, and an algorithm that is optimal with respect to the measure of the data transfer time. We also present experimental results featuring the performance tuneability of our index algorithms on the IBM SP-1 parallel system. ududIn the concatenation operation, among n processors, initially, each processor has one block of data, and the goal is to concatenate the n blocks of data from the n processors, and to make the concatenation result known to all the processors. We present a concatenation algorithm that is optimal, for most values of n, in the number of communication rounds and in the amount of data transferred.
展开▼
机译:我们为消息传递系统中的两种全部通信操作提供了有效的算法:索引(或全部到所有个性化通信)和串联(或全部到所有广播)。我们假设一个完全连接的消息传递系统的模型,其中任何点对点通信的性能都独立于发送方-接收方对。我们还假定每个处理器具有k≥1个端口,通过该端口,它可以在每个通信回合中发送和接收k条消息。我们使用的复杂性度量独立于特定的系统拓扑,并基于通信启动时间和通信带宽。 ud ud在n个处理器之间进行索引操作时,最初,每个处理器具有n个数据块,目标是将处理器j的第i个块与处理器i的第j个块进行交换。我们提出了一类针对n的所有值设计的索引算法,其特征是在通信启动时间和数据传输时间之间进行权衡。这类算法包括两种特殊情况:一种是在启动时间方面最佳的算法,另一种是在数据传输时间方面最佳的算法。我们还介绍了具有IBM SP-1并行系统上索引算法的性能可调整性的实验结果。 ud ud在级联操作中,最初在n个处理器之间,每个处理器都有一个数据块,并且目标是将n个处理器中的n个数据块进行级联,并使所有处理器知道级联结果。我们提出一种级联算法,对于大多数n值,在通信回合数和传输的数据量方面都是最佳的。
展开▼