首页> 外文会议>7th ACM computing frontiers conference 2010 >Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore
【24h】

Models for Generating Locality-Tuned Traveling Threads for a Hierarchical Multi-level Heterogeneous Multicore

机译:分层多级异构多核生成局部调整行进线程的模型

获取原文
获取原文并翻译 | 示例

摘要

As heterogeneous multicore processors become more widespread, many options are emerging for producing efficient parallel code for such processors. Although parallel programming languages are improving, manual partitioning of computations and data across heterogeneous processing resources is proving extraordinarily difficult. Further, it is becoming increasingly important to consider locality when producing parallel code, as data transport is a primary source of performance overhead and energy consumption. To address these problems, we propose a novel model for extracting parallel computations from sequential code for a hierarchical multi-level heterogeneous processor which we present called the Passive/Active Multicore (PAM). The computations take the form of short, fine-grained threads, which are generated with consideration to locality through cache profiling and have the ability to migrate from core to core up through the memory hierarchy based on the location of operands. Experimental results across both integer and floating point intensive standard and scientific workloads show that the architecture, execution model, and computational extraction techniques together offer computational offloads of up to 24% (5.8% on average). Through simulation, we estimate these offloads may translate into speedups of up to 19% (4.0% on average) and that negative effects on performance are negligible. Floating point applications seem to be most aided by these techniques.
机译:随着异构多核处理器的普及,出现了许多为此类处理器生成有效的并行代码的选择。尽管并行编程语言正在改进,但是跨异构处理资源的计算和数据的手动分区却异常困难。此外,在生成并行代码时考虑本地性变得越来越重要,因为数据传输是性能开销和能耗的主要来源。为了解决这些问题,我们提出了一种新颖的模型,用于从分层多级异构处理器的顺序代码中提取并行计算,我们将其称为被动/主动多核(PAM)。计算采用短而细粒度的线程的形式,这些线程是通过考虑缓存缓存的局部性而生成的,并且具有根据操作数的位置在整个内存层次结构中从核心向上迁移到核心的能力。在整数和浮点密集型标准和科学工作负载上的实验结果表明,架构,执行模型和计算提取技术共同提供高达24%(平均5.8%)的计算分流。通过仿真,我们估计这些卸载可以使速度提高多达19%(平均为4.0%),并且对性能的负面影响可以忽略不计。这些技术似乎最有助于浮点应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号