A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

Jordi Roca; Victor Moya; Carlos Gonzalez; Vicente Escandell; Albert Murciego; Agustin Fernandez; Roger Espasa

首页> 外文期刊>The Visual Computer >A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

【24h】

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

机译：SIMD高效的14指令着色器程序，用于高通量微三角光栅化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper shows that breaking the barrier of 1 triangle/clock rasterization rate for microtriangles in modern GPU architectures in an efficient way is possible. The fixed throughput of the special purpose culling and triangle setup stages of the classic pipeline limits the GPU scalability to rasterize many triangles in parallel when these cover very few pixels. In contrast, the shader core counts and increasing GFLOPs in modern GPUs clearly suggests parallelizing this computation entirely across multiple shader threads, making use of the powerful wide-ALU instructions. In this paper, we present a very efficient SIMD-like rasterization code targeted at very small triangles that scales very well with the number of shader cores and has higher performance than traditional edge equation based algorithms. We have extended the ATTILA GPU shader ISA (del Barrioet al. in IEEE International Symposium on Performance Analysis of Systemsrnand Software, pp. 231-241, 2006) with two fixed point instructions to meet the rasterization precision requirement. This paper also introduces a novel subpixel Bounding Box size optimization that adjusts the bounds much more finely, which is critical for small triangles, and doubles the 2 × 2-pixel stamp test efficiency. The proposed shader rasterization program can run on top of the original pixel shader program in such a way that selected fragments are rasterized, attribute interpolated and pixel shaded in the same pass. Our results show that our technique yields better performance than a classic rasterizer at 8 or more shader cores, with speedups as high as 4 × for 16 shader cores.

机译：本文表明，可以有效地打破现代GPU架构中微三角形的1三角形/时钟光栅化率的障碍。经典流水线的专用剔除和三角形设置阶段的固定吞吐量限制了GPU的可伸缩性，以在许多三角形覆盖很少的像素时并行光栅化许多三角形。相比之下，现代GPU中的着色器核心数量和不断增加的GFLOP显然建议利用强大的Wide-ALU指令在多个着色器线程之间完全并行化此计算。在本文中，我们提出了一种针对非常小的三角形的非常有效的类SIMD光栅化代码，该代码可以很好地缩放着色器核心的数量，并且比基于传统边缘方程的算法具有更高的性能。我们使用两个定点指令扩展了ATTILA GPU着色器ISA（del Barrioet等人在IEEE International Symposium on Systemsrnand Software进行的IEEE International Symposium on Systemsrnand Software，pp.231-241，2006年）中，以满足光栅化精度要求。本文还介绍了一种新颖的子像素边界框尺寸优化方法，该方法可以更精细地调整边界，这对于小三角形至关重要，并使2×2像素图章测试效率翻倍。所提出的着色器栅格化程序可以在原始像素着色器程序之上运行，以使选定的片段在同一遍中被栅格化，属性插值和像素着色。我们的结果表明，与8个或更多着色器核心的经典光栅化器相比，我们的技术可产生更好的性能，而16个着色器核心的加速高达4倍。

著录项

来源
《The Visual Computer》 |2010年第8期|707-719|共13页
作者
Jordi Roca; Victor Moya; Carlos Gonzalez; Vicente Escandell; Albert Murciego; Agustin Fernandez; Roger Espasa;
展开▼
作者单位

Computer Architecture Department (UPC), Barcelona, Spain;

Computer Architecture Department (UPC), Barcelona, Spain;

Computer Architecture Department (UPC), Barcelona, Spain;

Computer Architecture Department (UPC), Barcelona, Spain;

Computer Architecture Department (UPC), Barcelona, Spain;

Computer Architecture Department (UPC), Barcelona, Spain;

Intel Barcelona, Barcelona, Spain;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
microtriangle rasterization; GPU rendering; shader performance;

机译：微三角光栅化GPU渲染;着色器性能;

相似文献

外文文献
中文文献
专利

1. An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing [J] . Jung-Wook Park, Hoon-Mo Yang, Gi-Ho Park, Journal of Parallel and Distributed Computing . 2010,第11期

机译：用于多线程3D图形处理的指令收缩式可编程着色器体系结构
2. Cross-Platform Ubiquitous Volume Rendering Using Programmable Shaders in VTK for Scientific and Medical Visualization [J] . Chaudhary Aashish, Jhaveri Sankhesh J., Sanchez Alvaro, IEEE Computer Graphics and Applications . 2019,第1期

机译：在VTK中使用可编程着色器进行跨平台的无处不在体积渲染，以实现科学和医学可视化
3. Cross-Platform Ubiquitous Volume Rendering Using Programmable Shaders in VTK for Scientific and Medical Visualization [J] . Chaudhary Aashish, Jhaveri Sankhesh J., Sanchez Alvaro, IEEE Computer Graphics and Applications . 2019,第1期

机译：在VTK中使用可编程着色器进行科学和医学可视化的跨平台普遍存在卷积
4. Performance comparison of rasterization-based graphics pipeline and ray tracing on GPU shaders [C] . Chun-Fa Chang, Kuan-Wei Chen, Chin-Chien Chuang International Conference on Digital Signal Processing . 2015

机译：GPU着色器上基于栅格化的图形管线和光线跟踪的性能比较
5. A comparative study of three instructional modalities in a computer programming course: Traditional instruction, Web-based instruction, and online instruction. [D] . Caldwell, Elvira Rebecca. 2006

机译：对计算机编程课程中三种教学方式的比较研究：传统教学，基于Web的教学和在线教学。
6. The Development of a Novel Shade Selection Program for Fixed Shade Translucent Dental Materials [O] . Melody N. Carney, William M. Johnston -1

机译：固定阴影半透明牙科材料的新型阴影选择程序的开发
7. Alternative Diffuse Lighting and Specular Reflection Approach Using YIQ Color Space for 3D Scene Visualization Using Programmable HLSL Shaders [O] . Y. Kotsarenko, F. Ramos 2012

机译：使用YIQ色彩空间的可选漫反射光照和镜面反射方法使用可编程HLsL着色器进行3D场景可视化
8. Efficient Partitioning of Fragment Shaders for Multipass Rendering on Programmable Graphics Hardware [R] . Chan, E. , Ng, R. , Sen, P. , 2002

机译：用于可编程图形硬件多路渲染的片段着色器的高效分区

A SIMD-efficient 14 instruction shader program for high-throughput microtriangle rasterization

摘要

著录项

相似文献

相关主题

期刊订阅