This paper discusses the design and implementation of MPI's general active target synchronization on the Intel Single-Chip Cloud Computer, a non-cache-coherent many-core CPU. Measurements show a performance benefit of a factor of four compared to the default SCC-tuned MPI implementation and demonstrate the feasibility of implementing efficiently a shared memory protocol despite the lack of cache coherence. Further, a classification of implementation designs of MPI's general active target synchronization is presented.
展开▼