首页> 外文期刊>Parallel Computing >A time-stamping system to detect memory consistency errors in MPI one-sided applications
【24h】

A time-stamping system to detect memory consistency errors in MPI one-sided applications

机译:一种时间戳系统,用于检测MPI单面应用程序中的内存一致性错误

获取原文
获取原文并翻译 | 示例

摘要

Many high performance computing applications have been developed by using MPI one-sided communication. The separation between data movement and synchronization poses enormous challenges for programmers in preserving the reliability of programs. One of those challenges is the detection of memory consistency errors, which are a notorious bug, degrading the reliability and performance of programs. Even an MPI expert can easily make these mistakes. The lockopts bug, which occurred in an RMA test case of the MPICH MPI implementation, is an example for this situation. MC-Checker is the most effective debugger in solving the memory consistency errors. MC-Checker did ignore the transitive ordering of the happened-before relation to ensure the acceptable overheads in terms of time complexity. Consequently, MC-Checker is prone to error due to the source of false positives attributable to the ignorance of the transitive ordering of the happened-before relation. To address this issue, we propose a time-stamping system based on the encoded vector clock to help preserve the full happened-before relation with reasonable overhead. The system is implemented in MC-CChecker, which is an enhancement of MC-Checker. The experimental findings prove that MC-CChecker not only effectively detects memory consistency errors like MC-Checker did, but also completely eliminates the potential source of false positives, which is a major limitation of MC-Checker while still retaining acceptable overheads of execution time and memory usage. Especially, MC-CChecker is fairly scalable when processing a large number of trace files generated from running the lockopts up to 8192 processes. (C) 2019 Elsevier B.V. All rights reserved.
机译:通过使用MPI单面通信已经开发了许多高性能计算应用程序。数据移动与同步之间的分离对程序员在保持程序可靠性方面提出了巨大的挑战。这些挑战之一是检测内存一致性错误,这是一个臭名昭著的错误,降低了程序的可靠性和性能。即使是MPI专家也可以轻松地犯这些错误。这种情况的一个示例是在MPICH MPI实现的RMA测试案例中发生的lockopts错误。 MC-Checker是解决内存一致性错误的最有效的调试器。 MC-Checker确实忽略了事前关联的传递顺序,以确保就时间复杂度而言可接受的开销。因此,由于误报的来源而导致MC-Checker容易出错,这归因于对事前关系的传递顺序的无知。为了解决这个问题,我们提出了一种基于编码矢量时钟的时间戳系统,以合理的开销帮助保留之前发生的全部关系。该系统在MC-CChecker中实现,它是MC-Checker的增强功能。实验结果证明,MC-CChecker不仅可以像MC-Checker一样有效地检测内存一致性错误,而且可以完全消除误报的潜在来源,这是MC-Checker的主要限制,同时仍保留了可接受的执行时间和开销。内存使用情况。特别是,MC-CChecker在处理由于运行锁而生成的大量跟踪文件时可扩展性高达8192个进程。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Parallel Computing》 |2019年第8期|36-44|共9页
  • 作者单位

    Ho Chi Minh City Univ Technol, Fac Comp Sci & Engn, High Performance Comp Lab, Vnuhcm, Vietnam;

    Ho Chi Minh City Univ Technol, Fac Comp Sci & Engn, High Performance Comp Lab, Vnuhcm, Vietnam;

    Ludwig Maximilians Univ LMU Munich, Comp Sci Dept, MNM Team, Oettingenstr 67, D-80538 Munich, Germany;

    Ho Chi Minh City Univ Technol, Fac Comp Sci & Engn, High Performance Comp Lab, Vnuhcm, Vietnam;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Memory consistency error; MPI; One-sided communication; Encoded vector clock; MC-CChecker;

    机译:内存一致性错误;MPI;单面通信;编码矢量时钟;MC-CChecker;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号