Tuning Hardware and Software for Multiprocessors.

机译：调整多处理器的硬件和软件。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Technology scaling trends have enabled the exponential growth of computing power. However, the performance of communication subsystems scales less aggressively. This means that an application constrained by memory/interconnect performance will not be able to use the available computing power efficiently---in fact, technology scaling will make this efficiency even worse. This problem can be alleviated if algorithms minimize communication. To this end, we describe communication-avoiding algorithms and highly optimized implementations of a sparse linear algebra kernel called "matrix powers". Results show up to 2.3x improvement in performance over the naive algorithms on modern architectures. Our multi-core implementation of matrix powers enables us to develop a communication-avoiding iterative solver for sparse linear systems which is up to 2.1x faster than a conventional Generalized Minimal Residual method (GMRES) implementation.;Another problem plaguing the supercomputer industry is the power bottleneck---power has, in fact, become the pre-eminent design constraint for future high-performance computing systems which is why computational efficiency is being emphasized over simply peak performance. Static benchmark codes have traditionally been used to find architectures optimal with respect to specific metrics. Unfortunately, because compilers generate suboptimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software co-tuning as a novel approach for system design. In co-tuning, traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate co-tuning by exploring the parameter space of a Tensilica's Xtensa-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning improves hardware area and power efficiency by up to 3x and 2.4x respectively.

机译：技术扩展趋势使计算能力呈指数增长。但是，通信子系统的性能扩展范围较小。这意味着受内存/互连性能限制的应用程序将无法有效地使用可用的计算能力-实际上，技术扩展将使这种效率更加糟糕。如果算法使通信最小化，则可以缓解此问题。为此，我们描述了一种稀疏的线性代数内核称为“矩阵幂”的避免通信算法和高度优化的实现。结果表明，与现代体系结构上的朴素算法相比，性能提高了2.3倍。我们矩阵电源的多核实现使我们能够为稀疏线性系统开发一种避免通信的迭代求解器，它比传统的通用最小残差方法（GMRES）的实现速度快2.1倍;困扰超级计算机行业的另一个问题是功率瓶颈–实际上，功率已成为未来高性能计算系统的主要设计约束，这就是为什么在简单的峰值性能上强调计算效率的原因。传统上，静态基准代码已用于查找相对于特定指标最佳的体系结构。不幸的是，由于编译器会生成次优的代码，因此基准性能可能无法很好地表明体系结构设计点的性能潜力。因此，我们将硬件/软件协同调整作为一种新颖的系统设计方法。在协同调整中，传统架构的空间探索与软件自动调整紧密结合在一起，从而在面积和功率效率方面实现了实质性的改进。我们通过研究Tensilica基于Xtensa的多处理器的参数空间来演示协同调整，该处理器运行科学计算中三个使用最频繁的内核，每个内核都具有广泛的微体系结构要求：稀疏矩阵向量乘法，基于模板的计算，和一般的矩阵矩阵乘法。结果表明，共调谐分别将硬件面积和电源效率提高了3倍和2.4倍。

著录项

作者
Mohiyuddin, Marghoob.;
展开▼
作者单位

University of California, Berkeley.;

展开▼
授予单位 University of California, Berkeley.;
学科 Engineering Computer.;Computer Science.
学位 Ph.D.
年度 2012
页码 182 p.
总页数 182
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Hardware approaches to cache coherence in shared-memory multiprocessors. 2 [J] . Tomasevic M., Milutinovic V. IEEE Micro . 1994,第6期

机译：共享内存多处理器中缓存一致性的硬件方法。 2
2. Hardware and software systems for the determination of charged particle parameters in low pressure plasmas using impedance-tuned Langmuir probes [J] . Yuancai Ye, R. Kenneth Marcus Spectrochimica Acta, Part B. Atomic Spectroscopy . 1997,第14期

机译：使用阻抗调谐的Langmuir探针确定低压等离子体中带电粒子参数的硬件和软件系统
3. Co-Z ECC scalar multiplications for hardware, software and hardware-software co-design on embedded systems [J] . Brian Baldwin, Raveen R. Goundar, Mark Hamilton, Journal of cryptographic engineering . 2012,第4期

机译：嵌入式系统上用于硬件，软件和软硬件协同设计的Co-Z ECC标量乘法
4. Hardware/software co-design techniques for compass search self-tuning PID controller in DC drive applications [C] . Nawfal N. Al-Saaty, Maher Algreer, Matthew Armstrong IEEE International Symposium on Industrial Electronics . 2017

机译：直流驱动器应用中用于指南针搜索自整定PID控制器的硬件/软件协同设计技术
5. A hardware/software co-design architecture for thermal, power, and reliability management in chip multiprocessors. [D] . Khan, Omer. 2010

机译：用于芯片多处理器中的热，功率和可靠性管理的硬件/软件协同设计架构。
6. Concepts Issues and Standards. Standards for Medical Hardware Software and Information Exchange: MIB Software-Semantic Model ASN.1 Messaging for a Pulse Oximeter [O] . Helen W. Gottschalk, Jan Wittenber 1990

机译：概念问题和标准。医疗硬件软件和信息交换的标准：脉搏血氧仪的MIB软件语义模型ASN.1消息传递
7. Combining Software and Hardware Monitoring for Improved Power and Performance Tuning [O] . Eric Chi, A. Michael Salem, R. Iris Bahar, 2003

机译：结合软件和硬件监控以改进功率和性能调整
8. Cache group scheme for hardware-controlled cache coherence and the general need for hardware coherence control in large-scale multiprocessors. [R] . Hoag, J. E. 1991

机译：用于硬件控制的高速缓存一致性的高速缓存组方案以及大规模多处理器中硬件一致性控制的一般需求。

Tuning Hardware and Software for Multiprocessors.

摘要

著录项

相似文献

相关主题

期刊订阅