Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

M.B. Giles; G.R. Mudalige; Z. Sharif; G. Markall; P.H.J. Kelly

首页> 外文期刊>Computer Journal, The >Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

【24h】

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

机译：多核体系结构上的OP2框架的性能分析和优化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to execute on different multi-core/many-core hardware. Runtime performance results are presented for a representative unstructured mesh application on a variety of many-core processor systems, including traditional X86 architectures from Intel (Xeon based on the older Penryn and current Nehalem micro-architectures) and GPU offerings from NVIDIA (GTX260, Tesla C2050). Our analysis demonstrates the contrasting performance between the use of CPU (OpenMP) and GPU (CUDA) parallel implementations for the solution of an industrial-sized unstructured mesh consisting of about 1.5 million edges. Results show the significance of choosing the correct partition and thread-block configuration, the factors limiting the GPU performance and insights into optimizations for improved performance.

机译：本文介绍了OP2“活动”库的基准测试，性能分析和优化研究，该库为并行执行非结构化网格应用程序提供了抽象框架。 OP2旨在将应用程序的科学规范与其并行实现脱钩，从而通过将应用程序重新定位为在不同的多核/多核硬件上执行来实现代码寿命和接近最佳的性能。给出了在多种多核处理器系统上具有代表性的非结构化网格应用程序的运行时性能结果，包括英特尔的传统X86架构（基于旧Penryn的至强处理器和当前的Nehalem微体系结构）以及英伟达的GPU产品（GTX260，特斯拉C2050）。我们的分析表明，使用CPU（OpenMP）和GPU（CUDA）并行实现解决工业规模的非结构化网格（由大约150万条边组成）的解决方案之间存在鲜明的对比。结果表明选择正确的分区和线程块配置的重要性，限制GPU性能的因素以及对优化性能的见解。

著录项

来源
《Computer Journal, The》 |2012年第2期|p.168-180|共13页
作者
M.B. Giles; G.R. Mudalige; Z. Sharif; G. Markall; P.H.J. Kelly;
展开▼
作者单位

Corresponding author:;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures [J] . M.B. Giles, G.R. Mudalige, Z. Sharif, The Computer journal . 2012,第2期

机译：多核体系结构上的OP2框架的性能分析和优化
2. Performance Analysis of the OP2 Framework on Many-core Architectures [J] . M.B. Giles, G.R. Mudalige, Z. Sharif, Performance evaluation review . 2011,第4期

机译：OP2框架在多核体系结构上的性能分析
3. Optimizing the performance of reactive molecular dynamics simulations for many-core architectures [J] . Aktulga Hasan Metin, Knight Chris, Coffman Paul, International Journal of High Performance Computing Applications . 2019,第2期

机译：优化多核体系结构的反应分子动力学模拟的性能
4. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures [C] . Mudalige G.R., Giles M.B., Reguly I., 2012 Innovative Parallel Computing. . 2012

机译：OP2：一种主动库框架，用于解决多核和多核体系结构上的非结构化基于网格的应用程序
5. An efficient design space exploration framework to optimize power-efficient heterogeneous many-core multi-threading embedded processor architectures. [D] . Datta, Kushal. 2011

机译：一个有效的设计空间探索框架，用于优化省电的异构多核多线程嵌入式处理器体系结构。
6. High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures [O] . Daehyun Kim, Joshua Trzasko, Mikhail Smelyanskiy, 2011

机译：使用多核架构的高性能3D压缩传感MRI重建
7. Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures [O] . M. B. Giles, G. R. Mudalige, Z. Sharif, 2011

机译：在多核架构上对Op2框架进行性能分析和优化

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures

摘要

著录项

相似文献

相关主题

期刊订阅