Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance

机译：集成分支和内存发散容差的动态经细分

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

SIMD organizations amortize the area and power of fetch, decode, and issue logic across multiple processing units in order to maximize throughput for a given area and power budget. However, throughput is reduced when a set of threads operating in lockstep (a warp) are stalled due to long latency memory accesses. The resulting idle cycles are extremely costly. Multi-threading can hide latencies by interleaving the execution of multiple warps, but deep multi-threading using many warps dramatically increases the cost of the register files (multi-threading depth x SIMD width), and cache contention can make performance worse. Instead, intra-warp latency hiding should first be exploited. This allows threads that are ready but stalled by SIMD restrictions to use these idle cycles and reduces the need for multi-threading among warps. This paper introduces dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space. Independent scheduling entities allow divergent branch paths to interleave their execution, and allow threads that hit to run ahead. The result is improved latency hiding and memory level parallelism (MLP). We evaluate the technique on a coherent cache hierarchy with private LI caches and a shared L2 cache. With an area overhead of less than 1%, experiments with eight data-parallel benchmarks show our technique improves performance on average by 1.7X.

机译：SIMD组织分摊跨多个处理单元的获取，解码和发布逻辑的面积和功率，以便在给定面积和功率预算下最大化吞吐量。但是，当由于长时间等待内存访问而使以锁步（扭曲）方式运行的一组线程停止时，吞吐量会降低。产生的空转周期非常昂贵。多线程可以通过交错执行多个扭曲来隐藏等待时间，但是使用多个扭曲的深度多线程会大大增加寄存器文件的成本（多线程深度x SIMD宽度），并且缓存争用会使性能变差。相反，应首先利用经纱内延迟隐藏。这使准备就绪但由于SIMD限制而停滞的线程可以使用这些空闲周期，并减少了线程束之间对多线程的需求。本文介绍了动态扭曲细分（DWS），它允许单个扭曲在调度程序中占用多个插槽，而无需额外的寄存器文件空间。独立的调度实体允许不同的分支路径交错执行，并允许命中的线程提前运行。结果是改进了延迟隐藏和内存级别并行性（MLP）。我们在具有私有LI缓存和共享L2缓存的一致缓存层次结构上评估该技术。在不到1％的区域开销的情况下，使用八个数据并行基准进行的实验表明，我们的技术将性能平均提高了1.7倍。

著录项

来源
《37th annual international symposium on computer architecture 2010》|2010年|p.235-246|共12页
会议地点 Saint Malo(FR);Saint Malo(FR)
作者
Jiayuan Meng; David Tarjarr; Kevin Skadron;
展开▼
作者单位

Department of Computer Science University of Virginia;

Department of Computer Science University of Virginia;

Department of Computer Science University of Virginia;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类总体结构、系统结构;
关键词
simd; branch divergence; latency hiding; memory divergence; warp;

机译：simd;分支分歧；延迟隐藏；记忆差异经;

相似文献

外文文献
中文文献
专利

1. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance [J] . Jiayuan Meng, David Tarjan, Kevin Skadron Computer architecture news . 2010,第3期

机译：集成分支和内存发散容差的动态经细分
2. Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling [J] . Licheng Yu, Xingsheng Tang, Minghui Wu, Journal of systems architecture . 2014,第5期

机译：通过新的PDOM堆栈和多级翘曲调度提高GPGPU上的分支发散性能
3. Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications [J] . Sartori J., Kumar R. Multimedia, IEEE Transactions on . 2013,第2期

机译：分支和数据从属：降低容错GPU应用程序的控制和内存差异
4. Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance [C] . Jiayuan Meng, David Tarjarr, Kevin Skadron International symposium on computer architecture . 2010

机译：集成分支和内存分流耐受性的动态翘曲细分
5. Rapid Evolution of Thermal Tolerance and Genomic Divergence During the Invasion of a Prolific Non-Native Red Seaweed, Gracilaria vermiculophylla [D] . Flanagan, Benjamin Allen. 2017

机译：入侵多产非本地红海藻Gra草的过程中耐热性和基因组差异的快速演变
6. Clustering of Bacterial Growth Dynamics in Response to Growth Media by Dynamic Time Warping [O] . Yang-Yang Cao, Tetsuya Yomo, Bei-Wen Ying 2020

机译：动态时间规整对生长培养基响应的细菌生长动力学聚类
7. Leveraging Memory Level Parallelism Using Dynamic Warp Subdivision [O] . Jiayuan Meng, David Tarjan, Kevin Skadron 2012

机译：利用动态扭曲细分利用内存级别的并行性

Dynamic Warp Subdivision for Integrated Branch and Memory Divergence Tolerance

摘要

著录项

相似文献

相关主题

期刊订阅