Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism

机译：移液器：通过核心内管道并行性提高不规则应用程序的核心利用率

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Applications with irregular memory accesses and control flow, such as graph algorithms and sparse linear algebra, use high-performance cores very poorly and suffer from dismal IPC. Instruction latencies are so large that even SMT cores running multiple data-parallel threads suffer poor utilization.We find that irregular applications have abundant pipeline parallelism that can be used to boost utilization: these applications can be structured as a pipeline of stages decoupled by queues. Queues hide latency very effectively when they allow producer stages to run far ahead of consumers. Prior work has proposed decoupled architectures, such as DAE and streaming multicores, that implement queues in hardware to exploit pipeline parallelism. Unfortunately, prior decoupled architectures are ill-suited to irregular applications, as they lack the control mechanisms needed to achieve decoupling, and target decoupling across cores but suffer from poor utilization within each core due to load imbalance across stages.We present Pipette, a technique that enables cheap pipeline parallelism within each core. Pipette decouples threads within the core using architecturally visible queues. Pipette’s ISA features control mechanisms that allow effective decoupling under irregular control flow. By time-multiplexing stages on the same core, Pipette avoids load imbalance and achieves high core IPC. Pipette’s novel implementation uses the physical register file to implement queues at very low cost, putting otherwise-idle registers to use. Pipette also adds cheap hardware to accelerate common access patterns, enabling fine-grain composition of accelerated accesses and general-purpose computation. As a result, Pipette outperforms data-parallel implementations of several challenging irregular applications by gmean 1.9× (and up to 3.9×).

机译：具有不规则存储器访问和控制流的应用程序（例如图形算法和稀疏线性代数）使用高性能内核的能力非常差，并且遭受IPC惨淡的困扰。指令等待时间如此之大，以至于运行多个数据并行线程的SMT内核都受到不良的利用。我们发现非常规应用程序具有丰富的管道并行性，可以用来提高利用率：这些应用程序可以被构造为由队列解耦的阶段的管道。当队列允许生产者阶段远远领先于消费者时，它们可以非常有效地隐藏延迟。先前的工作提出了分离的体系结构，例如DAE和流式多核，该体系结构在硬件中实现队列以利用管线并行性。不幸的是，先前的解耦架构不适合非常规应用，因为它们缺乏实现解耦所需的控制机制，无法实现跨核的目标解耦，但由于各个阶段的负载不平衡而使每个核内部的利用率不佳。这样就可以在每个内核中实现廉价的管道并行性。移液器使用体系结构可见的队列将核心内的线程解耦。移液器的ISA具有控制机制，可在不规则的控制流下实现有效的去耦。通过在同一内核上进行时分多路复用，Pipette避免了负载不平衡，并实现了高内核IPC。 Pipette的新颖实现使用物理寄存器文件以非常低的成本实现队列，从而使闲置的寄存器得以使用。移液器还添加了廉价的硬件来加速常见的访问模式，从而实现加速访问和通用计算的细粒度组合。结果，移液器的性能达到了1.9倍（最高3.9倍）的性能，可胜过一些具有挑战性的不规则应用的数据并行实现。

著录项

来源
《International Symposium on Multidisciplinary Studies and Innovative Technologies》|2020年|596-608|共13页
会议地点
作者
Quan M. Nguyen; Daniel Sanchez;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
n/a;

机译：不适用;

相似文献

外文文献
中文文献
专利

1. How much parallelism is there in irregular applications? [J] . Wolfgang Schreiner Computing reviews . 2010,第9期

机译：在不规则应用程序中有多少并行性？
2. How Much Parallelism is There in Irregular Applications? [J] . Kulkarni M, Burtscher M, Inkulu R, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2009,第4期

机译：不规则应用程序中有多少并行性？
3. A real-time scratchpad-centric OS with predictable inter/intra-core communication for multi-core embedded systems [J] . Tabish Rohan, Mancuso Renato, Wasly Saud, Real-time systems . 2019,第4期

机译：实时的以便笺本为中心的操作系统，具有可预测的内核间/内核间通信，适用于多核嵌入式系统
4. Improving bank-level parallelism for irregular applications [C] . Xulong Tang, Mahmut Kandemir, Praveen Yedlapalli, Annual IEEE/ACM International Symposium on Microarchitecture . 2016

机译：改进非常规应用程序的库级并行性
5. Enabling Efficient Parallelism for Applications with Dependences and Irregular Memory Accesses [D] . Jiang, Peng. 2019

机译：为具有依赖性和不规则内存访问的应用程序启用有效的并行性
6. Inter- and intra-core laboratory variability in the quantitative coronary angiography analysis for drug-eluting stent treatment and follow up [O] . Shigenori Ito, Kanako Kinoshita, Akiko Endo, 2020

机译：用于药物洗脱支架治疗的定量冠状动脉造影分析中的间核心血管造影分析和随访
7. Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications [O] . Patrick Carribault, Albert Cohen, William Jalby 2005

机译：深度阻塞：粗粒度并行性转换为指令级和矢量并行性的不规则应用

Pipette: Improving Core Utilization on Irregular Applications through Intra-Core Pipeline Parallelism

摘要

著录项

相似文献

相关主题

期刊订阅