首页> 外文会议>International Workshop on Embedded Multicore Systems >An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework
【24h】

An Empirical Evaluation of Design Abstraction and Performance of Thrust Framework

机译:设计抽象与推力框架性能的实证评价

获取原文

摘要

High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a proper programming abstraction paradigm that strikes a balance between the abstraction and visibility over the hardware so that the programmer can write a program without having to understand the hardware nuances, yet exploit the compute power optimally. In this paper we have analyzed the power of design abstraction and performance of a popular design abstraction framework called Thrust. We have shown quantitatively that while it is easier to write an application using Thrust compared to writing the same in the native CUDA or OpenMP backends, the framework does not provide any abstraction over the memory hierarchy of the underlying backend to the programmer. We have compared the performance of three Thrust applications with their corresponding native versions in CUDA, OpenMP, Xeon-Phi and the CPP backends and demonstrate that the current Thrust version performs poorly in most of the cases when the application is compute intensive. However, the framework provides close to the native performance for a non-compute intensive applications. We analyze the reasons for the performance and highlight the improvements necessary for the framework.
机译:高性能计算应用更难以写入,因此,从业者期望一个良好调整的软件持续时间,即使硬件升级时,也要提供优化的性能。也可能需要使用硬件上使用足够的抽象来编写软件,以便它能够在异构架构上运行。因此,需要具有适当的编程抽象范例,可在硬件上击中抽象和可见性之间的平衡,使得程序员可以在不必了解硬件细微差别的情况下编写程序,但最佳地利用计算功率。在本文中,我们分析了名为推力的流行设计抽象框架的设计抽象和性能的力量。与在本机Cuda或Openmp后端的写入相同的相比,相比,我们已经定量地显示了使用推力的虽然在本机中的相同中编写应用程序,但是该框架不会向程序员的基础后端的内存层次结构提供任何抽象。将三个推力应用程序的性能与CUDA,OpenMP,Xeon-Phi和CPP后端的相应本机版本进行了比较,并证明当前推力版本在应用程序计算密集型时在大多数情况下表现不佳。但是,该框架提供了对非计算密集型应用程序的本机性能。我们分析了绩效的原因,并突出了框架所需的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号