Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing

Dutta Sanghamitra; Jeong Haewon; Yang Yaoqing; Cadambe Viveck; Low Tze Meng; Grover Pulkit

首页> 外文期刊>Proceedings of the IEEE >Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing

【24h】

Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing

机译：使用编码计算解决新兴设备和非von Neumann架构中的不可靠性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computing systems are evolving rapidly. At the device level, emerging devices are beginning to compete with traditional CMOS systems. At the architecture level, novel architectures are successfully avoiding the communication bottleneck that is a central feature, and a central limitation, of the von Neumann architecture. Furthermore, such systems are increasingly plagued by unreliability. This unreliability arises at device or gate-level in emerging devices, and can percolate up to processor or system-level if left unchecked. The goal of this article is to survey recent advances in reliable computing using unreliable elements, with an eye on nonsilicon and non-von Neumann architectures. We first observe that instead of aiming for generic computing problems, the community could use "dwarfs of modern computing," first noted in the high-performance computing (HPC) community, as a starting point. These computing problems are the basic building blocks of almost all scientific computing, machine learning, and data analytics today. Next, we survey the state of the art in "coded computing," which is an emerging area that advances on classical algorithm-based fault-tolerance (ABFT) and brings a fundamental information-theoretic perspective. By weaving error-correcting codes into a computing algorithm, coded computing provides dramatic improvements on solutions, as well as obtains novel fundamental limits, for problems that have been open for more than 30 years. We introduce existing and novel coded computing techniques in the context of "coded dwarfs," where a specific dwarf's computation is made resilient by applying coding. We discuss how, for the same redundancy, "coded dwarfs" are significantly more resilient compared to classical techniques such as replication. Furthermore, by examining a widely popular computation task-training large neural networks-we demonstrate how coded dwarfs can be applied to address this fundamentally nonlinear problem. Finally, we discuss practical challenges and future directions in implementing coded computing techniques on emerging and existing nonsilicon and/or non-von Neumann architectures.

机译：计算系统正在快速发展。在设备级别，新兴设备开始与传统的CMOS系统竞争。在体系结构级别，新颖的架构成功避免了von neumann架构的中央特征的通信瓶颈和中央限制。此外，这种系统越来越困难而不灵活地困扰。这种不可靠性在新兴设备中的设备或门级时出现，并且如果未选中，则可以通过处理器或系统级别渗透。本文的目标是使用不可靠的元素来调查最近的可靠计算的进步，并以不可靠的元素为眼睛，并在不可靠的元素上注意到非冯诺伊曼架构。我们首先观察到，而不是针对通用计算问题，社区可以使用“现代计算的矮人”，首先在高性能计算（HPC）社区中，作为起点。这些计算问题是当今几乎所有科学计算，机器学习和数据分析的基本构建块。接下来，我们在“编码计算”中调查本领域的状态，这是一个新兴区域，其基于古典算法的容错（ABFT），并带来了基本信息理论的视角。通过将纠错码编织到计算算法中，编码计算提供了对解决方案的戏剧性改进，以及获得新的基本限制，对于已开放30多年的问题。我们在“编码的矮人”的上下文中介绍现有的和新的编码计算技术，其中通过应用编码来使特定的DWARF的计算成为弹性。我们讨论如何，对于相同的冗余，与诸如复制之类的经典技术相比，“编码的Dwarfs”显着更具弹性。此外，通过检查广泛流行的计算任务训练大型神经网络 - 我们展示了如何应用编码的矮人来解决这一基本上非线性问题。最后，我们讨论了在新兴和现有的非尼蒙昂和/或非von Neumann架构上实施编码计算技术的实际挑战和未来方向。

著录项

来源
《Proceedings of the IEEE》 |2020年第8期|1219-1234|共16页
作者
Dutta Sanghamitra; Jeong Haewon; Yang Yaoqing; Cadambe Viveck; Low Tze Meng; Grover Pulkit;
展开▼
作者单位

Carnegie Mellon Univ Dept Elect & Comp Engn Pittsburgh PA 15213 USA;

Carnegie Mellon Univ Dept Elect & Comp Engn Pittsburgh PA 15213 USA;

Univ Calif Berkeley UC Berkeley Dept Elect Engn & Comp Sci Berkeley CA 94720 USA;

Penn State Univ Dept Elect Engn State Coll PA 16801 USA;

Carnegie Mellon Univ Dept Elect & Comp Engn Pittsburgh PA 15213 USA;

Carnegie Mellon Univ Dept Elect & Comp Engn Pittsburgh PA 15213 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Computer architecture; Engines; Reliability; Encoding; Machine learning; Fault tolerant systems; Error correction codes; Computer errors; distributed algorithms; distributed computing; distributed processing; error correction; error correction codes; fault tolerance; fault tolerant systems; high performance computing (HPC); large-scale systems; parallel architectures; parallel machines; parallel processing; supercomputers;

机译：电脑架构;发动机;可靠性;机器学习;容错系统;纠错码;计算机错误;分布式算法;分布式计算;分布式处理;纠错;纠错码;容错系统;容错系统;高性能计算;HPC）;大规模系统;平行架构;并联机器;并行处理;超级计算机;
入库时间 2022-08-18 20:56:49

相似文献

外文文献
中文文献
专利

1. Arbitrary Code Injection through Self-propagating Worms in Von Neumann Architecture Devices [J] . Thanassis Giannetsos, Tassos Dimitriou, Ioannis Krontiris, The Computer journal . 2010,第10期

机译：冯·诺依曼体系结构设备中通过自传播蠕虫进行任意代码注入
2. Arbitrary Code Injection through Self-propagating Worms in Von Neumann Architecture Devices [J] . Thanassis Giannetsos, Tassos Dimitriou, Ioannis Krontiris, Computer Journal, The . 2010,第10期

机译：冯·诺依曼体系结构设备中通过自传播蠕虫进行任意代码注入
3. Overcoming device unreliability with continuous learning in a population coding based computing system [J] . Mizrahi Alice, Grollier Julie, Querlioz Damien, Journal of Applied Physics . 2018,第15期

机译：在基于人口编码的计算系统中通过持续学习来克服设备的不可靠性
4. The Superstrider Architecture: Integrating Logic and Memory Towards Non-Von Neumann Computing [C] . Sriseshan Srikanth, Thomas M. Conte, Erik P. DeBenedictis, IEEE International Conference on Rebooting Computing . 2017

机译：超级跨界架构：将逻辑和内存集成到非冯·诺依曼计算
5. Spintronics-Based Architectures for Non-Von Neumann Computing [D] . Mondal, Ankit. 2020

机译：基于SpintRonics的非von Neumann Computing架构
6. Fog Computing and Edge Computing Architectures for Processing Data From Diabetes Devices Connected to the Medical Internet of Things [O] . David C. Klonoff 2017

机译：雾计算和边缘计算架构用于处理连接到医疗物联网的糖尿病设备的数据
7. Smart Logic-in-Memory Architecture for Low-Power Non-Von Neumann Computing [O] . Tommaso Zanotti, Francesco Maria Puglisi, Paolo Pavan 2020

机译：低功耗非von neumann计算的智能逻辑内存架构
8. Fault and Defect Tolerant Computer Architectures: Reliable Computing with Unreliable Devices [R] . Roelke, I. G. 2006

机译：故障和缺陷容忍的计算机体系结构：可靠的计算与不可靠的设备

Addressing Unreliability in Emerging Devices and Non-von Neumann Architectures Using Coded Computing

摘要

著录项

相似文献

相关主题

期刊订阅