【24h】

Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations' Perspective

机译:从编译器优化的角度分析64位x86架构的性能

获取原文
获取原文并翻译 | 示例

摘要

Intel Extended Memory 64 Technology (EM64T) and AMD 64-bit architecture (AMD64) are emerging 64-bit x86 architectures that are fully x86 compatible. Compared with the 32-bit x86 architecture, the 64-bit x86 architectures cater some new features to applications. For instance, applications can address 64 bits of virtual memory space, perform operations on 64-bit-wide operands, get access to 16 general-purpose registers (GPRs) and 16 extended multi-media (XMM) registers, and use a register-based argument passing convention. In this paper, we investigate the performance impacts of these new features from compiler optimizations' standpoint. Our research compiler is based on the Intel Fortran/C++ production compiler, and our experiments are conducted on the SPEC2000 benchmark suite. Results show that for 64-bit-wide pointer and long data types, several SPEC2000 C benchmarks are slowed down by more than 20%, which is mainly due to the enlarged memory footprint. To evaluate the performance potential of 64-bit x86 architectures, we designed and implemented the LP32 code model such that the sizes of pointer and long are 32 bits. Our experiments demonstrate that on average the LP32 code model speeds up the SPEC2000 C benchmarks by 13.4%. For the register-based argument passing convention, our experiments show that the performance gain is less than 1% because of the aggressive function inlining optimization. Finally, we observe that using 16 GPRs and 16 XMM registers significantly outperforms the scenario when only 8 GPRs and 8 XMM registers are used. However, our results also show that using 12 GPRs and 12 XMM registers can achieve as competitive performance as employing 16 GPRs and 16 XMM registers.
机译:英特尔扩展内存64技术(EM64T)和AMD 64位体系结构(AMD64)是新兴的完全与x86兼容的64位x86体系结构。与32位x86体系结构相比,64位x86体系结构为应用程序提供了一些新功能。例如,应用程序可以寻址64位虚拟内存空间,对64位宽的操作数执行操作,可以访问16个通用寄存器(GPR)和16个扩展多媒体(XMM)寄存器,以及使用以下寄存器:基于参数传递约定。在本文中,我们将从编译器优化的角度研究这些新功能的性能影响。我们的研究编译器基于Intel Fortran / C ++生产编译器,并且我们的实验是在SPEC2000基准套件上进行的。结果表明,对于64位宽的指针和长数据类型,某些SPEC2000 C基准测试速度降低了20%以上,这主要是由于内存占用量增大所致。为了评估64位x86架构的性能潜力,我们设计并实现了LP32代码模型,以使指针和long的大小为32位。我们的实验表明,平均而言,LP32代码模型可使SPEC2000 C基准测试速度提高13.4%。对于基于寄存器的参数传递约定,我们的实验表明,由于积极的函数内联优化,性能提升不到1%。最后,我们观察到只有16个GPR和8个XMM寄存器被使用时,使用16个GPR和16个XMM寄存器显着优于方案。但是,我们的结果还表明,使用12个GPR和12个XMM寄存器可以达到与使用16个GPR和16个XMM寄存器一样的竞争性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号