首页> 外文学位 >Tuning Hardware and Software for Multiprocessors.
【24h】

Tuning Hardware and Software for Multiprocessors.

机译:调整多处理器的硬件和软件。

获取原文
获取原文并翻译 | 示例

摘要

Technology scaling trends have enabled the exponential growth of computing power. However, the performance of communication subsystems scales less aggressively. This means that an application constrained by memory/interconnect performance will not be able to use the available computing power efficiently---in fact, technology scaling will make this efficiency even worse. This problem can be alleviated if algorithms minimize communication. To this end, we describe communication-avoiding algorithms and highly optimized implementations of a sparse linear algebra kernel called "matrix powers". Results show up to 2.3x improvement in performance over the naive algorithms on modern architectures. Our multi-core implementation of matrix powers enables us to develop a communication-avoiding iterative solver for sparse linear systems which is up to 2.1x faster than a conventional Generalized Minimal Residual method (GMRES) implementation.;Another problem plaguing the supercomputer industry is the power bottleneck---power has, in fact, become the pre-eminent design constraint for future high-performance computing systems which is why computational efficiency is being emphasized over simply peak performance. Static benchmark codes have traditionally been used to find architectures optimal with respect to specific metrics. Unfortunately, because compilers generate suboptimal code, benchmark performance can be a poor indicator of the performance potential of architecture design points. Therefore, we present hardware/software co-tuning as a novel approach for system design. In co-tuning, traditional architecture space exploration is tightly coupled with software auto-tuning for delivering substantial improvements in area and power efficiency. We demonstrate co-tuning by exploring the parameter space of a Tensilica's Xtensa-based multi-processor running three of the most heavily used kernels in scientific computing, each with widely varying micro-architectural requirements: sparse matrix vector multiplication, stencil-based computations, and general matrix-matrix multiplication. Results demonstrate that co-tuning improves hardware area and power efficiency by up to 3x and 2.4x respectively.
机译:技术扩展趋势使计算能力呈指数增长。但是,通信子系统的性能扩展范围较小。这意味着受内存/互连性能限制的应用程序将无法有效地使用可用的计算能力-实际上,技术扩展将使这种效率更加糟糕。如果算法使通信最小化,则可以缓解此问题。为此,我们描述了一种稀疏的线性代数内核称为“矩阵幂”的避免通信算法和高度优化的实现。结果表明,与现代体系结构上的朴素算法相比,性能提高了2.3倍。我们矩阵电源的多核实现使我们能够为稀疏线性系统开发一种避免通信的迭代求解器,它比传统的通用最小残差方法(GMRES)的实现速度快2.1倍;困扰超级计算机行业的另一个问题是功率瓶颈–实际上,功率已成为未来高性能计算系统的主要设计约束,这就是为什么在简单的峰值性能上强调计算效率的原因。传统上,静态基准代码已用于查找相对于特定指标最佳的体系结构。不幸的是,由于编译器会生成次优的代码,因此基准性能可能无法很好地表明体系结构设计点的性能潜力。因此,我们将硬件/软件协同调整作为一种新颖的系统设计方法。在协同调整中,传统架构的空间探索与软件自动调整紧密结合在一起,从而在面积和功率效率方面实现了实质性的改进。我们通过研究Tensilica基于Xtensa的多处理器的参数空间来演示协同调整,该处理器运行科学计算中三个使用最频繁的内核,每个内核都具有广泛的微体系结构要求:稀疏矩阵向量乘法,基于模板的计算,和一般的矩阵矩阵乘法。结果表明,共调谐分别将硬件面积和电源效率提高了3倍和2.4倍。

著录项

  • 作者

    Mohiyuddin, Marghoob.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 182 p.
  • 总页数 182
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号