首页> 外文会议>International Symposium on Multidisciplinary Studies and Innovative Technologies >Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism
【24h】

Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism

机译:移液器:通过核心内管道并行性提高不规则应用程序的核心利用率

获取原文

摘要

Applications with irregular memory accesses and control flow, such as graph algorithms and sparse linear algebra, use high-performance cores very poorly and suffer from dismal IPC. Instruction latencies are so large that even SMT cores running multiple data-parallel threads suffer poor utilization.We find that irregular applications have abundant pipeline parallelism that can be used to boost utilization: these applications can be structured as a pipeline of stages decoupled by queues. Queues hide latency very effectively when they allow producer stages to run far ahead of consumers. Prior work has proposed decoupled architectures, such as DAE and streaming multicores, that implement queues in hardware to exploit pipeline parallelism. Unfortunately, prior decoupled architectures are ill-suited to irregular applications, as they lack the control mechanisms needed to achieve decoupling, and target decoupling across cores but suffer from poor utilization within each core due to load imbalance across stages.We present Pipette, a technique that enables cheap pipeline parallelism within each core. Pipette decouples threads within the core using architecturally visible queues. Pipette’s ISA features control mechanisms that allow effective decoupling under irregular control flow. By time-multiplexing stages on the same core, Pipette avoids load imbalance and achieves high core IPC. Pipette’s novel implementation uses the physical register file to implement queues at very low cost, putting otherwise-idle registers to use. Pipette also adds cheap hardware to accelerate common access patterns, enabling fine-grain composition of accelerated accesses and general-purpose computation. As a result, Pipette outperforms data-parallel implementations of several challenging irregular applications by gmean 1.9× (and up to 3.9×).
机译:具有不规则存储器访问和控制流的应用程序(例如图形算法和稀疏线性代数)使用高性能内核的能力非常差,并且遭受IPC惨淡的困扰。指令等待时间如此之大,以至于运行多个数据并行线程的SMT内核都受到不良的利用。我们发现非常规应用程序具有丰富的管道并行性,可以用来提高利用率:这些应用程序可以被构造为由队列解耦的阶段的管道。当队列允许生产者阶段远远领先于消费者时,它们可以非常有效地隐藏延迟。先前的工作提出了分离的体系结构,例如DAE和流式多核,该体系结构在硬件中实现队列以利用管线并行性。不幸的是,先前的解耦架构不适合非常规应用,因为它们缺乏实现解耦所需的控制机制,无法实现跨核的目标解耦,但由于各个阶段的负载不平衡而使每个核内部的利用率不佳。这样就可以在每个内核中实现廉价的管道并行性。移液器使用体系结构可见的队列将核心内的线程解耦。移液器的ISA具有控制机制,可在不规则的控制流下实现有效的去耦。通过在同一内核上进行时分多路复用,Pipette避免了负载不平衡,并实现了高内核IPC。 Pipette的新颖实现使用物理寄存器文件以非常低的成本实现队列,从而使闲置的寄存器得以使用。移液器还添加了廉价的硬件来加速常见的访问模式,从而实现加速访问和通用计算的细粒度组合。结果,移液器的性能达到了1.9倍(最高3.9倍)的性能,可胜过一些具有挑战性的不规则应用的数据并行实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号