Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems

【24h】

Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems

机译：使用排队系统的多套接字平台上并行循环的性能建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Predicting the performance of parallel loops on modern shared-memory multi-socket multi-core systems in dependence of the allocated resources is an important means to achieve better system utilization. Previous prediction techniques are tied to specific architectures and do not allow for purely online performance predictions without requiring an offline analysis of the parallel program. This paper presents a practical approach based on queueing theory to model the performance of parallel programs in dependence of the number of allocated core resources. Based on the key insight that scalability of scientific parallel loops is limited by memory performance, a hierarchically constructed M/M/1/N/N queue system is used to analytically compute the response time at the different congestion points in the memory system of modern NUMA architectures. After automatically tuning the model to a specific architecture by executing a number of micro-benchmarks, the required parameter values are obtained at runtime from hardware performance counters present in modern commodity AMD and Intel processors. Evaluated with 24 OpenMP parallel loops on a 64-core AMD and a 72-core Intel multi-socket platform, the presented queueing system is able to accurately predict the speedup of parallel loops with a mean absolute percentage error of 8.3 percent on the AMD system and 6.7 percent on the Intel platform.

机译：依靠分配的资源来预测现代共享内存多路多核系统上并行循环的性能是实现更好的系统利用率的重要手段。先前的预测技术与特定体系结构相关联，并且在不要求对并行程序进行脱机分析的情况下，无法进行纯粹的在线性能预测。本文提出了一种基于排队论的实用方法，可根据分配的核心资源的数量对并行程序的性能进行建模。基于科学并行循环的可扩展性受内存性能限制的关键见解，使用分层构造的M / M / 1 / N / N队列系统来分析计算现代内存系统中不同拥塞点的响应时间NUMA体系结构。通过执行许多微基准测试将模型自动调整为特定的体系结构后，可以在运行时从现代商品AMD和Intel处理器中存在的硬件性能计数器获得所需的参数值。通过在64核AMD和72核Intel多路平台上使用24个OpenMP并行循环进行评估，该排队系统能够准确预测并行循环的速度，在AMD系统上的平均绝对百分比误差为8.3％在英特尔平台上为6.7％。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2020年第2期|318-331|共14页
作者

展开▼
作者单位

Seoul Natl Univ Sch Comp Sci & Engn Seoul 08826 South Korea;

Seoul Natl Univ Sch Comp Sci & Engn Seoul 08826 South Korea|SAP Labs Korea Seoul 06578 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Computational modeling; Servers; Predictive models; Time factors; Multicore processing; Dynamic scheduling; Performance modeling; parallel loop; queueing system; multi-socket system; OpenMP; NUMA;

机译：计算建模;服务器;预测模型;时间因素;多核处理;动态调度;性能建模;并联回路排队系统多路系统OpenMP;NUMA;

相似文献

外文文献
中文文献
专利

1. Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems [J] . S. Donath, K. Iglberger, G. Wellein, International Journal of Computational Science and Engineering . 2008,第1期

机译：多核多插槽系统上不同并行格子Boltzmann实现的性能比较
2. Performance Evaluation of Parallel Processing Systems Using Queueing Network Model [J] . CHANINTORN JITTAWIRIYANUKOON WSEAS Transactions on Computers . 2006,第3期

机译：基于排队网络模型的并行处理系统性能评估
3. A Study on the Closed-Loop Performance in Extrapolated Regions of Operations of Nonlinear Systems Using Parallel OBF-NN Models [J] . Zabiri Haslinda, Marappagounder Ramasamy, Lemma Tufa Dendena Journal of Chemical Engineering of Japan . 2016,第1a3期

机译：基于并行OBF-NN模型的非线性系统外推操作区域的闭环性能研究
4. Predicting the Performance of Parallel Computing Models Using Queuing System [C] . Shen Chao, Tong Weiqin, Kausar Samina IEEE/ACM international symposium on cluster, cloud and grid computing . 2015

机译：使用排队系统预测并行计算模型的性能
5. Scalability Analysis of Parallel and Distributed Processing Systems via Fork and Join Queueing Network Models [D] . Zeng, Yun 2018

机译：通过Fork和Join Queuing网络模型分析并行和分布式处理系统的可伸缩性
6. A Discrete Event Simulation Model for Evaluating the Performances of an M/G/C/C State Dependent Queuing System [O] . Ruzelan Khalid, Mohd Kamal M. Nawawi, Luthful A. Kawsar, -1

机译：评估M / G / C / C状态相关排队系统性能的离散事件仿真模型
7. General queueing network models for computer system performance analysis. A maximum entropy method of analysis and aggregation of general queueing network models with application to computer systems. [O] . El-Affendi Mohamed Ahmed 1983

机译：用于计算机系统性能分析的通用排队网络模型。分析和汇总一般排队网络模型的最大熵方法，并应用于计算机系统。
8. Queueing Network Systems with Unbalanced Flows and Their Applications to Performance Evaluation of Highly Parallel Distributed Information Systems. Revision [R] . Wang, Y. R., Madnick, S. E. 1984

机译：不平衡流排队网络系统及其在高度并行分布式信息系统性能评估中的应用。调整

Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅