Communication latency is central to multiprocessor design. This report presents the design principles of EM-X multiprocessor towards tolerating communication latency. Multi-threading principle is built in the EM-X to overlap communication and computation for latency tolerance. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access mechanism. The priority-based scheduling policy extends a FIFO ordered thread invocation policy to adapt to different computational needs. The direct remote memory access based on non-preemptive thread execution is designed to overlap remote memory operations while executing threads. We give two examples to explain our approach. The 80-processor prototype of EM-X is currently being fabricated and is expected to be operational in the near future. Preliminary evaluation indicates that the EM-X can effectively overlap computation and communication, toward tolerating communication latency for high performance parallel computing.
展开▼