...
首页> 外文期刊>Computer Journal, The >Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures
【24h】

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

机译:多核体系结构上的OP2框架的性能分析和优化

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to execute on different multi-core/many-core hardware. Runtime performance results are presented for a representative unstructured mesh application on a variety of many-core processor systems, including traditional X86 architectures from Intel (Xeon based on the older Penryn and current Nehalem micro-architectures) and GPU offerings from NVIDIA (GTX260, Tesla C2050). Our analysis demonstrates the contrasting performance between the use of CPU (OpenMP) and GPU (CUDA) parallel implementations for the solution of an industrial-sized unstructured mesh consisting of about 1.5 million edges. Results show the significance of choosing the correct partition and thread-block configuration, the factors limiting the GPU performance and insights into optimizations for improved performance.
机译:本文介绍了OP2“活动”库的基准测试,性能分析和优化研究,该库为并行执行非结构化网格应用程序提供了抽象框架。 OP2旨在将应用程序的科学规范与其并行实现脱钩,从而通过将应用程序重新定位为在不同的多核/多核硬件上执行来实现代码寿命和接近最佳的性能。给出了在多种多核处理器系统上具有代表性的非结构化网格应用程序的运行时性能结果,包括英特尔的传统X86架构(基于旧Penryn的至强处理器和当前的Nehalem微体系结构)以及英伟达的GPU产品(GTX260,特斯拉C2050)。我们的分析表明,使用CPU(OpenMP)和GPU(CUDA)并行实现解决工业规模的非结构化网格(由大约150万条边组成)的解决方案之间存在鲜明的对比。结果表明选择正确的分区和线程块配置的重要性,限制GPU性能的因素以及对优化性能的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号