Demand look-ahead memory access scheduling for 3D graphics processing units

Chih-Chieh Hsiao; Min-Jen Lo; Slo-Li Chu

首页> 外文期刊>Multimedia Tools and Applications >Demand look-ahead memory access scheduling for 3D graphics processing units

【24h】

Demand look-ahead memory access scheduling for 3D graphics processing units

机译：3D图形处理单元的需求预见存储器访问调度

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid growing complexity of 3D applications, the memory subsystem has become the most bandwidth-exhausting bottleneck in a Graphics Processing Unit (GPU). To produce realistic images, tens to hundreds of thousands of primitives are used. Furthermore, each primitive generates thousands of pixels, and these pixels are computed by shaders with special effects, even to blend multiple texture pixels from external memory to obtain a final color. To hide the long latency texture operations, the shaders are usually highly multithreaded to increase its throughput. However, conventional memory scheduling mechanisms are unaware of the producer-consumer relationship between primitives and pixels. The conventional scheduling mechanisms neither assume that all initiators are independent nor that they use a fixed priority scheme. This paper proposes Demand Look-Ahead (DLA) memory access scheduling based on the statuses of each unit in the GPU, and dynamically generates priority for the memory request scheduler. By considering the producer-consumer relationship, the proposed mechanism reschedules most urgent requests to be serviced first. Experimental results show that the proposed DLA improves 1.47 % and 1.44 % in FPS and IPC, respectively, than First-Ready First-Come-First-Serve (FR-FCFS). By integrating DLA with Bank-level Parallelism Awareness (BPA), DLA-BPA improves FPS and IPC by 7.28 % and 6.55 %, respectively. Furthermore, shader thread performance is improved by 22.06 % and increases the attainable bandwidth by 5.91 % with DLA-BPA.

机译：随着3D应用程序复杂性的快速增长，内存子系统已成为图形处理单元（GPU）中最耗费带宽的瓶颈。为了产生逼真的图像，使用了成千上万的图元。此外，每个图元都会生成数千个像素，并且这些像素是由具有特殊效果的着色器计算的，甚至可以混合来自外部存储器的多个纹理像素以获得最终的颜色。为了隐藏长时间等待的纹理操作，着色器通常是高度多线程的，以增加其吞吐量。但是，常规的存储器调度机制并不了解图元和像素之间的生产者-消费者关系。传统的调度机制既不假定所有发起方都是独立的，也不假定它们使用固定优先级方案。本文基于GPU中每个单元的状态，提出了需求提前查询（DLA）内存访问调度，并动态生成了内存请求调度程序的优先级。通过考虑生产者与消费者之间的关系，提议的机制重新安排了最紧急的请求，使其首先得到服务。实验结果表明，与首先准备就绪的先来先服务（FR-FCFS）相比，所提出的DLA在FPS和IPC上分别提高了1.47％和1.44％。通过将DLA与银行级并行意识（BPA）集成，DLA-BPA分别将FPS和IPC提高了7.28％和6.55％。此外，使用DLA-BPA，着色器线程性能提高了22.06％，可达到的带宽增加了5.91％。

著录项

来源
《Multimedia Tools and Applications》 |2014年第3期|1391-1416|共26页
作者
Chih-Chieh Hsiao; Min-Jen Lo; Slo-Li Chu;
展开▼
作者单位

Department of Information and Computer Engineering, Chung Yuan Christian University, 200, Chung Pei Rd., Chung Li 32023, Taiwan;

Department of Information and Computer Engineering, Chung Yuan Christian University, 200, Chung Pei Rd., Chung Li 32023, Taiwan;

Department of Information and Computer Engineering, Chung Yuan Christian University, 200, Chung Pei Rd., Chung Li 32023, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Demand look-ahead; GPU; Graphics rendering; Memory access scheduling;

机译：需求提前GPU;图形渲染;内存访问调度;

相似文献

外文文献
中文文献
专利

1. Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs) [J] . Jing Li, Yunfeng Jiang, Chaowei Yang, Computers & geosciences . 2013,第SEPa期

机译：使用多核图形处理单元（GPU）和多核中央处理单元（CPU）可视化3D / 4D环境数据
2. EVALUATION OF SELECTED RESOURCE ALLOCATION AND SCHEDULING METHODS IN HETEROGENEOUS MANY-CORE PROCESSORS AND GRAPHICS PROCESSING UNITS [J] . Milosz CIZNICKI, Krzysztof KUROWSKI, Jan WEGLARZ Foundations of computing and decision sciences . 2014,第4期

机译：异构多核处理器和图形处理单元中的选定资源分配和调度方法的评估
3. Accelerating in-memory transaction processing using general purpose graphics processing units [J] . Gao Lan, Xu Yunlong, Wang Rui, Future generation computer systems . 2019,第AUGa期

机译：使用通用图形处理单元加速内存中事务处理
4. On the optimization of memory access to increase the performance of spatial preprocessing techniques on graphics processing units [C] . J. Delgado, G. Martín, J. Plaza, IEEE International Geoscience and Remote Sensing Symposium . 2016

机译：关于优化内存访问以提高图形处理单元上空间预处理技术的性能
5. Reducing irregularities in control flow and memory access on graphics processing unit architectures. [D] . King, James Sokhom. 2017

机译：减少图形处理单元体系结构上控制流和内存访问的不规则性。
6. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures [O] . Vijay Rana, Stephen Rudin, Daniel R. Bednarek -1

机译：使用图形处理单元（GPU）以促进荧光介入程序期间患者皮肤剂量分布的实时3D图形呈现
7. Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units [O] . Christian Obrecht, Frédéric Kuznik, Bernard Tourancheau, 2011

机译：全局存储器访问建模在图形处理单元上有效实现Boltzmann格子方法

Demand look-ahead memory access scheduling for 3D graphics processing units

摘要

著录项

相似文献

相关主题

期刊订阅