首页> 外文会议>Architecture of computing systems - ARCS 2012. >Using Dynamic Task Level Redundancy for OpenMP Fault Tolerance
【24h】

Using Dynamic Task Level Redundancy for OpenMP Fault Tolerance

机译:使用动态任务级别冗余实现OpenMP容错

获取原文
获取原文并翻译 | 示例

摘要

Obtaining fault tolerant applications and systems is one of today's most important topics of research. Fault tolerance is becoming more and more essential in shared memory parallel programs and in multi/many core architectures due to the decreasing size of transistors and growing number of failures. Very few research works and techniques for fault tolerant OpenMP programs were studied. These few works are based on checkpoint and recovery, and on static thread level redundancy techniques. However, these approaches may illustrate scalability issues when the number of cores increases or when an unbalanced workload exists. To overcome these issues, we present in this paper a dynamic task level redundancy technique for fault tolerant OpenMP applications. Our method is based on dynamically applying a Triple Modular Redundancy for OpenMP tasks through a dedicated runtime and on applying a majority voting to guarantee correct results. Our flexible fault tolerant OpenMP approach has been evaluated for performance and fault coverage and it showed small overhead with good error detection and recovery rate.
机译:获得容错应用程序和系统是当今最重要的研究主题之一。由于晶体管尺寸的减小和故障数量的增加,容错在共享存储器并行程序和多/许多核体系结构中变得越来越重要。几乎没有针对容错OpenMP程序的研究工作和技术。这几项工作是基于检查点和恢复以及静态线程级冗余技术的。但是,这些方法可能会说明当内核数量增加或存在不平衡的工作负载时的可伸缩性问题。为了克服这些问题,我们在本文中介绍了一种用于容错OpenMP应用程序的动态任务级别冗余技术。我们的方法基于通过专用运行时为OpenMP任务动态应用三重模块冗余,并应用多数投票以确保正确的结果。我们对灵活的容错OpenMP方法进行了性能和故障覆盖率评估,结果显示开销很小,并且具有良好的错误检测和恢复率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号