If multicore is a disruptive technology, try to imagine hybrid multicoresystems enhanced with accelerators! This is happening today as accelerators, inparticular Graphical Processing Units (GPUs), are steadily making their way intothe high performance computing (HPC) world. We highlight the trends leadingto the idea of hybrid manycore/GPU systems, and we present a set of techniquesthat can be used to e ciently program them. The presentation is in the contextof Dense Linear Algebra (DLA), a major building block for many scienti c computingapplications. We motivate the need for new algorithms that would split thecomputation in a way that would fully exploit the power that each of the hybridcomponents o ers. As the area of hybrid multicore/GPU computing is still in itsinfancy, we also argue for its importance in view of what future architectures maylook like. We therefore envision the need for a DLA library similar to LAPACKbut for hybrid manycore/GPU systems. We illustrate the main ideas with an LUfactorizationalgorithm where particular techniques are used to reduce the amountof pivoting, resulting in an algorithm achieving up to 388 GFlop/s for single andup to 99:4 GFlop/s for double precision factorization on a hybrid Intel Xeon (2x4cores @ 2.33 GHz) { NVIDIA GeForce GTX 280 (240 cores @ 1.30 GHz) system.
展开▼