Exploiting data representation for fault tolerance

Elliott J.; Hoemmen M.; Mueller F.

首页> 外文期刊>Journal of computational science >Exploiting data representation for fault tolerance

【24h】

Exploiting data representation for fault tolerance

机译：利用数据表示实现容错

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Incorrect computer hardware behavior may corrupt intermediate computations in numerical algorithms, possibly resulting in incorrect answers. Prior work models misbehaving hardware by randomly flipping bits in memory. We start by accepting this premise, and present an analytic model for the error introduced by a bit flip in an IEEE 754 floating-point number. We then relate this finding to the linear algebra concepts of normalization and matrix equilibration. In particular, we present a case study illustrating that normalizing both vector inputs of a dot product minimizes the probability of a single bit flip causing a large error in the dot product's result. Furthermore, the absolute error is either less than one or very large, which allows detection of large errors. Then, we apply this to the GMRES iterative solver. We count all possible errors that can be introduced through faults in arithmetic in the computationally intensive orthogonalization phase of GMRES, and show that when the matrix is equilibrated, the absolute error is bounded above by one. (C) 2016 Elsevier B.V. All rights reserved.

机译：错误的计算机硬件行为可能会破坏数值算法中的中间计算，从而可能导致错误的答案。现有工作模型通过随机翻转内存中的位来使硬件行为异常。我们首先接受这个前提，然后介绍一个针对IEEE 754浮点数中的位翻转引入的错误的解析模型。然后，我们将此发现与归一化和矩阵平衡的线性代数概念联系起来。特别是，我们提供了一个案例研究，说明对点积的两个向量输入进行归一化可最大程度地降低单个位翻转在点积结果中造成较大误差的可能性。此外，绝对误差小于1或很大，这允许检测大误差。然后，将其应用于GMRES迭代求解器。我们计算了在GMRES的计算密集型正交化阶段中通过算术故障可以引入的所有可能的误差，并表明，当矩阵均衡时，绝对误差被一个以上限制。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Journal of computational science》 |2016年第5期|51-60|共10页
作者
Elliott J.; Hoemmen M.; Mueller F.;
展开▼
作者单位

North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA|Sandia Natl Labs, Ctr Res Comp, POB 5800, Albuquerque, NM 87185 USA;

Sandia Natl Labs, Ctr Res Comp, POB 5800, Albuquerque, NM 87185 USA;

North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm-based fault tolerance; Resilient algorithms; Numerical methods;

机译：基于算法的容错;弹性算法;数值方法;

相似文献

外文文献
中文文献
专利

1. A Novel Shorten Erasure Based Reed Solomon Fault Tolerance Code for Road Traffic Data Fault Tolerance [J] . Md. Rafeeq, C. Sunil Kumar, N. Subhash Chandra International journal of soft computing . 2018,第1期

机译：基于新型缩短擦除的里德所罗门容错码用于道路交通数据容错
2. A Novel Shorten Erasure Based Reed Solomon Fault Tolerance Code for Road Traffic Data Fault Tolerance [J] . Md. Rafeeq, C. Sunil Kumar, N. Subhash Chandra International journal of soft computing . 2018,第1期

机译：基于新型缩短擦除的里德所罗门容错码用于道路交通数据容错
3. Exploiting self-organization and fault tolerance in wireless sensor networks: A case study on wildfire detection application [J] . Felipe Taliar Giuntini, Delano Medeiros Beder, Jó Ueyama International Journal of Distributed Sensor Networks . 2017,第4期

机译：利用无线传感器网络的自组织和容错能力：以野火检测应用为例
4. Exploiting Data Representation for Fault Tolerance [C] . Elliott James, Hoemmen Mark, Mueller Frank Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems;International Conference for High Performance Computing, Networking, Storage and Analysis . 2014

机译：利用数据表示实现容错
5. Exploiting Asynchrony for Performance and Fault Tolerance in Distributed Graph Processing [D] . Vora, Keval Dinesh. 2017

机译：在分布图处理中利用异步实现性能和容错
6. Increasing fault tolerance of data plane on the internet of things using the software-defined networks [O] . Katayoun Bakhshi Kiadehi, Amir Masoud Rahmani, Amir Sabbagh Molahosseini 2021

机译：使用软件定义的网络提高数据平面上数据平面的容错
7. Exploiting data representation for fault tolerance [O] . James Elliott, Mark Hoemmen, Frank Mueller 2014

机译：利用数据表示来实现容错

Exploiting data representation for fault tolerance

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅