首页> 外文会议>2012 IEEE 26th International Parallel and Distributed Processing Symposium >SyncChecker: Detecting Synchronization Errors between MPI Applications and Libraries
【24h】

SyncChecker: Detecting Synchronization Errors between MPI Applications and Libraries

机译:SyncChecker:检测MPI应用程序和库之间的同步错误

获取原文
获取原文并翻译 | 示例

摘要

While improving the performance, nonblocking communication is prone to synchronization errors between MPI applications and the underlying MPI libraries. Such synchronization error occurs in the following way. After initiating nonblocking communication and performing overlapped computation, the MPI application reuses the message buffer before the MPI library completes the use of the same buffer, which may lead to sending out corrupted message data or reading undefined message data. This paper presents a new method called Sync Checker to detect synchronization errors in MPI nonblocking communication. To examine whether the use of message buffers is well synchronized between the MPI application and the MPI library, Sync Checker first tracks relevant memory accesses in the MPI application and corresponding message send/receive operations in the MPI library. Then it checks whether the correct execution order between the MPI application and the MPI library is enforced by the MPI completion check routines. If not, Sync Checker reports the error with diagnostic information. To reduce runtime overhead, we propose three dynamic optimizations. We have implemented a prototype of Sync Checker on Linux and evaluated it with seven bug cases, i.e., five introduced by the original developers and two injected, in four different MPI applications. Our experiments show that Sync Checker detects all the evaluated synchronization errors and provides helpful diagnostic information. Moreover, our experiments with seven NAS Parallel Benchmarks demonstrate that Sync Checker incurs moderate runtime overhead, 1.3-9.5 times with an average of 5.2 times, making it suitable for software testing.
机译:在提高性能的同时,无阻塞通信易于在MPI应用程序和基础MPI库之间发生同步错误。这种同步错误以下列方式发生。在启动无阻塞通信并执行重叠计算之后,MPI应用程序会在MPI库完成对同一缓冲区的使用之前重用消息缓冲区,这可能导致发送损坏的消息数据或读取未定义的消息数据。本文提出了一种称为Sync Checker的新方法,用于检测MPI无阻塞通信中的同步错误。为了检查MPI应用程序和MPI库之间消息缓冲区的使用是否很好地同步,Sync Checker首先跟踪MPI应用程序中的相关内存访问以及MPI库中的相应消息发送/接收操作。然后,它检查MPI完成检查例程是否在MPI应用程序和MPI库之间执行了正确的执行顺序。如果不是,则Sync Checker会报告该错误以及诊断信息。为了减少运行时开销,我们提出了三种动态优化。我们已经在Linux上实现了Sync Checker的原型,并通过七个错误案例(即,五个由原始开发人员引入,两个被注入)在四个不同的MPI应用程序中对其进行了评估。我们的实验表明,Sync Checker可以检测所有评估出的同步错误并提供有用的诊断信息。此外,我们对七个NAS并行基准进行的实验表明,Sync Checker会产生适度的运行时开销,是1.3-9.5倍,平均为5.2倍,使其适合进行软件测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号