首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level
【24h】

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level

机译:迷失在抽象:在中级语言水平分析GPU的陷阱

获取原文

摘要

Modern GPU frameworks use a two-phase compilation approach. Kernels written in a high-level language are initially compiled to an implementation agnostic intermediate language (IL), then finalized to the machine ISA only when the target GPU hardware is known. Most GPU microarchitecture simulators available to academics execute IL instructions because there is substantially less functional state associated with the instructions, and in some situations, the machine ISA's intellectual property may not be publicly disclosed. In this paper, we demonstrate the pitfalls of evaluating GPUs using this higher-level abstraction, and make the case that several important microarchitecture interactions are only visible when executing lower-level instructions. Our analysis shows that given identical application source code and GPU microarchitecture models, execution behavior will differ significantly depending on the instruction set abstraction. For example, our analysis shows the dynamic instruction count of the machine ISA is nearly 2× that of the IL on average, but contention for vector registers is reduced by 3× due to the optimized resource utilization. In addition, our analysis highlights the deficiencies of using IL to model instruction fetching, control divergence, and value similarity. Finally, we show that simulating IL instructions adds 33% error as compared to the machine ISA when comparing absolute runtimes to real hardware.
机译:现代GPU框架使用两阶段编译方法。用高级语言编写的内核最初被编译为与实现无关的中间语言(IL),然后仅在目标GPU硬件已知的情况下最终确定为机器ISA。学者可使用的大多数GPU微体系结构模拟器都执行IL指令,因为与指令相关的功能状态要少得多,并且在某些情况下,可能不会公开披露机器ISA的知识产权。在本文中,我们演示了使用这种较高级别的抽象来评估GPU的陷阱,并提出了几个重要的微体系结构交互仅在执行较低级别的指令时才可见的情况。我们的分析表明,在给定相同的应用程序源代码和GPU微体系结构模型的情况下,执行行为将因指令集抽象的不同而有显着差异。例如,我们的分析表明,机器ISA的动态指令计数平均约为IL的2倍,但由于优化了资源利用率,向量寄存器的争用减少了3倍。此外,我们的分析突出了使用IL对指令提取,控制差异和值相似性进行建模的不足。最后,我们证明了在将绝对运行时间与实际硬件进行比较时,与机器ISA相比,仿真IL指令会增加33%的错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号