首页> 外文OA文献 >Towards dense linear algebra for hybrid GPU accelerated manycore systems
【2h】

Towards dense linear algebra for hybrid GPU accelerated manycore systems

机译:面向混合GPU加速多核系统的密集线性代数

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

If multicore is a disruptive technology, try to imagine hybrid multicoresystems enhanced with accelerators! This is happening today as accelerators, inparticular Graphical Processing Units (GPUs), are steadily making their way intothe high performance computing (HPC) world. We highlight the trends leadingto the idea of hybrid manycore/GPU systems, and we present a set of techniquesthat can be used to e ciently program them. The presentation is in the contextof Dense Linear Algebra (DLA), a major building block for many scienti c computingapplications. We motivate the need for new algorithms that would split thecomputation in a way that would fully exploit the power that each of the hybridcomponents o ers. As the area of hybrid multicore/GPU computing is still in itsinfancy, we also argue for its importance in view of what future architectures maylook like. We therefore envision the need for a DLA library similar to LAPACKbut for hybrid manycore/GPU systems. We illustrate the main ideas with an LUfactorizationalgorithm where particular techniques are used to reduce the amountof pivoting, resulting in an algorithm achieving up to 388 GFlop/s for single andup to 99:4 GFlop/s for double precision factorization on a hybrid Intel Xeon (2x4cores @ 2.33 GHz) { NVIDIA GeForce GTX 280 (240 cores @ 1.30 GHz) system.
机译:如果多核是一种破坏性技术,请尝试想象一下使用加速器增强的混合多核系统!今天,随着加速器,特别是图形处理单元(GPU)稳步进入高性能计算(HPC)世界,这种情况正在发生。我们重点介绍了导致混合多核/ GPU系统概念的趋势,并提出了一套可用于对其进行有效编程的技术。该演示是在稠密线性代数(DLA)的背景下进行的,它是许多科学计算应用程序的主要组成部分。我们激发了对新算法的需求,这些算法将以一种能够充分利用每个混合组件所具有功能的方式来拆分计算。由于混合多核/ GPU计算领域仍处于起步阶段,因此考虑到未来的架构,我们也认为它的重要性。因此,我们设想需要类似于LAPACK的DLA库,但需要用于混合多核/ GPU系统。我们使用LUfactorization算法说明了主要思想,其中使用了特殊的技术来减少数据透视量,从而导致算法在混合Intel Xeon上单次实现高达388 GFlop / s的速度,而双精度因数分解则达到99:4 GFlop / s的算法( 2个4核@ 2.33 GHz){NVIDIA GeForce GTX 280(240核@ 1.30 GHz)系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号