Modern processors have multiple cores on a chip to overcome power consumption and heat dissipation issues. As more and more compute cores become available on a single node, it is expected that node-local communication will play an increasingly greater role in overall performance of parallel applications such as MPI applications. It is therefore crucial to optimize intra-node communication paths utilized by MPI libraries. In this paper, we propose a novel design of a kernel extension, called LiMIC2, for high-performance MPI intra-node communication over multi-core systems. LiMIC2 can minimize the communication overheads by implementing lightweight primitives and provide portability across different interconnects and flexibility for performance optimization. Our performance evaluation indicates that LiMIC2 can attain 80% lower latency and more than three times improvement in bandwidth. Also the experimental results show that LiMIC2 can deliver bidirectional bandwidth greater than 11GB/s.
展开▼