首页> 外文期刊>Parallel Computing >A time-stamping system to detect memory consistency errors in MPI one-sided applications
【24h】

A time-stamping system to detect memory consistency errors in MPI one-sided applications

机译:一个时间戳系统,用于检测MPI单面应用中的内存一致性误差

获取原文
获取原文并翻译 | 示例

摘要

Many high performance computing applications have been developed by using MPI one-sided communication. The separation between data movement and synchronization poses enormous challenges for programmers in preserving the reliability of programs. One of those challenges is the detection of memory consistency errors, which are a notorious bug, degrading the reliability and performance of programs. Even an MPI expert can easily make these mistakes. The lockopts bug, which occurred in an RMA test case of the MPICH MPI implementation, is an example for this situation. MC-Checker is the most effective debugger in solving the memory consistency errors. MC-Checker did ignore the transitive ordering of the happened-before relation to ensure the acceptable overheads in terms of time complexity. Consequently, MC-Checker is prone to error due to the source of false positives attributable to the ignorance of the transitive ordering of the happened-before relation. To address this issue, we propose a time-stamping system based on the encoded vector clock to help preserve the full happened-before relation with reasonable overhead. The system is implemented in MC-CChecker, which is an enhancement of MC-Checker. The experimental findings prove that MC-CChecker not only effectively detects memory consistency errors like MC-Checker did, but also completely eliminates the potential source of false positives, which is a major limitation of MC-Checker while still retaining acceptable overheads of execution time and memory usage. Especially, MC-CChecker is fairly scalable when processing a large number of trace files generated from running the lockopts up to 8192 processes. (C) 2019 Elsevier B.V. All rights reserved.
机译:通过使用MPI单面通信开发了许多高性能计算应用。数据移动和同步之间的分离对维持程序可靠性的程序员来说巨大挑战。其中一个挑战是检测到内存一致性错误,这是一个臭名昭着的错误,劣化程序的可靠性和性能。即使是MPI专家也可以轻易造成这些错误。在MPICH MPI实现的RMA测试用例中发生的锁字错误是这种情况的示例。 MC-Checker是解决内存一致性错误的最有效调试器。 MC-Checker确实忽略了发生前发生的传递单位,以确保在时间复杂性方面可接受的开销。因此,由于误报的来源,MC-Checker因归因于发生在发生之前发生的发生的传递排序的无知的源极而导致的误差。为了解决这个问题,我们提出了一个基于编码的向量时钟的时间戳系统,以帮助保留与合理开销的关系之前的完整发生。该系统是在MC-CCHECKER中实现的,这是MC-Checker的增强。实验结果证明了MC-CCHECKER不仅有效地检测MC-Checker所做的内存一致性误差,而且还完全消除了误报的潜在来源,这是MC-Checker的一个主要限制,同时仍然保留了执行时间的可接受的开销。内存使用情况。特别是,在处理从运行最多8192个进程的锁定时生成的大量跟踪文件时,MC-CCHecker相当可扩展。 (c)2019 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Parallel Computing》 |2019年第8期|36-44|共9页
  • 作者单位

    Ho Chi Minh City Univ Technol Fac Comp Sci & Engn High Performance Comp Lab Vnuhcm Vietnam;

    Ho Chi Minh City Univ Technol Fac Comp Sci & Engn High Performance Comp Lab Vnuhcm Vietnam;

    Ludwig Maximilians Univ LMU Munich Comp Sci Dept MNM Team Oettingenstr 67 D-80538 Munich Germany;

    Ho Chi Minh City Univ Technol Fac Comp Sci & Engn High Performance Comp Lab Vnuhcm Vietnam;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Memory consistency error; MPI; One-sided communication; Encoded vector clock; MC-CChecker;

    机译:内存一致性错误;MPI;单面通信;编码矢量时钟;MC-CCHEKER;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号