首页> 外文会议>7th ACM computing frontiers conference 2010 >Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore

【24h】

Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore

机译：分层多级异构多核生成局部调整行进线程的模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

As heterogeneous multicore processors become more widespread, many options are emerging for producing efficient parallel code for such processors. Although parallel programming languages are improving, manual partitioning of computations and data across heterogeneous processing resources is proving extraordinarily difficult. Further, it is becoming increasingly important to consider locality when producing parallel code, as data transport is a primary source of performance overhead and energy consumption. To address these problems, we propose a novel model for extracting parallel computations from sequential code for a hierarchical multi-level heterogeneous processor which we present called the Passive/Active Multicore (PAM). The computations take the form of short, fine-grained threads, which are generated with consideration to locality through cache profiling and have the ability to migrate from core to core up through the memory hierarchy based on the location of operands. Experimental results across both integer and floating point intensive standard and scientific workloads show that the architecture, execution model, and computational extraction techniques together offer computational offloads of up to 24% (5.8% on average). Through simulation, we estimate these offloads may translate into speedups of up to 19% (4.0% on average) and that negative effects on performance are negligible. Floating point applications seem to be most aided by these techniques.

机译：随着异构多核处理器的普及，出现了许多为此类处理器生成有效的并行代码的选择。尽管并行编程语言正在改进，但是跨异构处理资源的计算和数据的手动分区却异常困难。此外，在生成并行代码时考虑本地性变得越来越重要，因为数据传输是性能开销和能耗的主要来源。为了解决这些问题，我们提出了一种新颖的模型，用于从分层多级异构处理器的顺序代码中提取并行计算，我们将其称为被动/主动多核（PAM）。计算采用短而细粒度的线程的形式，这些线程是通过考虑缓存缓存的局部性而生成的，并且具有根据操作数的位置在整个内存层次结构中从核心向上迁移到核心的能力。在整数和浮点密集型标准和科学工作负载上的实验结果表明，架构，执行模型和计算提取技术共同提供高达24％（平均5.8％）的计算分流。通过仿真，我们估计这些卸载可以使速度提高多达19％（平均为4.0％），并且对性能的负面影响可以忽略不计。这些技术似乎最有助于浮点应用。

著录项

来源
《7th ACM computing frontiers conference 2010》|2010年|P.227-236|共10页
会议地点 Bertinoro(IT);Bertinoro(IT)
作者
Patrick A. La Fratta; Peter M. Kogge;
展开▼
作者单位

Department of Computer Science and Engineering University of Notre Dame 384 Fitzpatrick Hall Notre Dame, IN 46556 USA;

Department of Computer Science and Engineering University of Notre Dame 384 Fitzpatrick Hall Notre Dame, IN 46556 USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
asymmetric multicore architectures; cache hierarchy de-sign; locality-cognizant parallelization; migrant threads; multithreaded architectures;

机译：非对称多核架构；缓存层次结构设计；局部识别并行化移民线程；多线程架构;
入库时间 2022-08-26 14:00:03

相似文献

外文文献
中文文献
专利

1. Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation [J] . Patrick A. La Fratta, Peter M. Kogge Journal of Parallel and Distributed Computing . 2013,第12期

机译：通过本地识别线程生成的分层异构多核的节能多线程
2. Bifurcation Method to Analysis of Traveling Wave Solutions for (3+ 1)-Dimensional Nonlinear Models Generated by the Jaulent-Miodek Hierarchy [J] . RAN Yanping, LI jing 偏微分方程：英文版 . 2018,第004期

机译：Jaulent-Miodek层次结构生成的（3 + 1）维非线性模型行波解的分叉方法
3. Bifurcation of Traveling Wave Solutions for (2+1)-Dimensional Nonlinear Models Generated by the Jaulent-Miodek Hierarchy [J] . YanpingRan, JingLi, XinLi, Abstract and applied analysis . 2015,第4期

机译：Jaulent-Miodek层次结构生成的（2 + 1）维非线性模型的行波解的分支
4. Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore [C] . ACM computing frontiers conference . 2010

机译：用于生成用于分层多级异构多核的位置调谐的旅行线程的模型
5. Exploiting heterogeneous multicore processors through fine-grained scheduling and low-overhead thread migration. [D] . Sawalha, Lina Hakam. 2012

机译：通过细粒度的调度和低开销的线程迁移来利用异构多核处理器。
6. Fold prediction by a hierarchy of sequence threading and modeling methods. [O] . L. Jaroszewski, L. Rychlewski, B. Zhang, 1998

机译：通过序列线程和建模方法的层次结构折叠预测。
7. An Application based Efficient Thread Level Parallelism Scheme on Heterogeneous Multicore Embedded System for Real Time Image Processing. [O] . K Indragandhi, Jawahar P K 2020

机译：基于应用基于应用的实时图像处理的异构多核嵌入式系统的高效线程平行方案。

Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore

摘要

著录项

相似文献

相关主题

期刊订阅