首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >MOPED: Orchestrating interprocess message data on CMPs
【24h】

MOPED: Orchestrating interprocess message data on CMPs

机译:嘲笑:CMPS上协调进程消息数据

获取原文

摘要

Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization due to conflicts and pollution. Explicit motion of data in these architectures, such as message passing, can provide hints about program behavior that can be used to hide latency and improve cache behavior. However, to make these models attractive, synchronization overhead and data copying must also be offloaded from the processors. In this paper, we describe a Message Orchestration and Performance Enhancement Device (MOPED) that provides hardware mechanisms to support state-of-the-art message passing protocols such as MPI. MOPED extends the per-processor cache controllers and coherence protocol to support message synchronization and management in hardware, to transfer message data efficiently without intermediate buffer copies, and to place useful data in caches in a timely manner. MOPED thus allows full overlap between communication and computation on the cores. We extended a 16-core full-system simulator based on Simics and FeS2. MOPED interacts with the directory controllers to orchestrate message data. We evaluated benefits to performance and coherence traffic by integrating MOPED into the MPICH runtime. Relative to unmodified MPI execution, MOPED reduces execution time of real applications (NAS Parallel Benchmarks) by 17–45% and of communication microbenchmarks (Intel's IMB) by 76–94%. Off-chip memory misses are reduced by 43–88% for applications and by 75–100% for microbenchmarks.
机译:未来CMP将结合许多具有深度缓存层次结构的简单核心。通过更多核心,每个核心的缓存资源更少,并且必须仔细共享以避免由于冲突和污染而利用差。这些架构中的数据显式运动,例如消息传递,可以提供关于可用于隐藏延迟和提高缓存行为的程序行为的提示。但是,为了使这些模型具有吸引力,同步开销和数据复制也必须从处理器卸载。在本文中,我们描述了一种消息编排和性能增强设备(MOPED),提供硬件机制,以支持最先进的消息传递诸如MPI的协议。 MOPED扩展了每个处理器高速缓存控制器和一致性协议,以支持硬件中的消息同步和管理,以有效地在没有中间缓冲区副本的情况下有效地传输消息数据,并及时地将高速缓存中的有用数据放置。因此,在核心上允许在通信和计算之间完全重叠。我们基于SIMICS和FES2扩展了一个16核心全系统模拟器。嘲笑与目录控制器交互以协调消息数据。我们通过将MOPED锁定到MPICH运行时,我们评估了对性能和连贯性交通的益处。相对于未修改的MPI执行,嘲笑将实际应用程序(NAS并行基准)的执行时间减少17-45%,并且通信微币(Intel的IMB)达到76-94%。芯片内存未命中的应用程序减少了43-88%,微不足道的75-100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号