首页> 外文学位 >Architectural support for scalable speculative parallelization in shared-memory multiprocessors.

【24h】

Architectural support for scalable speculative parallelization in shared-memory multiprocessors.

机译：对共享内存多处理器中的可伸缩投机并行化的体系结构支持。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems.; In this thesis, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed of the node. We effectively utilize a speculative CMP as the building block for our scheme.; Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important nonanalyzable scientific loops, we report average speedups of 5.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.; With speculative parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation at run time, it squashes offending threads and reverts to a safe state. Squashing can cripple performance, especially in scalable multiprocessors and systems that do not support speculative state at the fine granularity of memory words.; In this thesis, we also propose a new approach to reduce the cost of handling cross-thread data dependence violations: run-time learning. Using a new module called the Violation Prediction Table, the hardware learns to stall a thread when it seems likely to trigger a squash, and to release it when it is unlikely to trigger one. Simulations of a 16-processor scalable system show that the scheme is very effective. For a protocol that keeps speculation state on a per-line basis at the system level, learning eliminates on average 84% of the squashes. The resulting system runs on average 43% faster, and its performance is very close to a system with perfect prediction.

机译：推测性并行化积极地执行编译器无法完全并行化的并行代码。过去的硬件方案建议主要集中在单芯片多处理器（CMP）上，其有效性必然受到其小尺寸的限制。在可伸缩共享内存系统的上下文中，很少有方案尝试过这种技术。在本文中，我们提出并评估了一种用于可扩展的推测并行化的新硬件方案。该设计需要相对简单的硬件，并且可以有效地集成到与缓存相关的NUMA系统中。我们已经设计了节点。我们有效地利用了投机性CMP作为我们计划的基础。仿真表明，所提出的体系结构以适度的硬件成本实现了良好的加速。对于一组重要的不可分析的科学循环，我们报告16个处理器的平均加速比为5.2。我们表明，我们的应用程序需要支持每个单词的推测状态，否则性能会受到很大影响。通过推测性并行化，无法完全执行编译器分析的代码将并行执行。如果硬件在运行时检测到跨线程相关性违规，则它将压缩有问题的线程并恢复为安全状态。压缩会降低性能，尤其是在可扩展的多处理器和不支持以存储字的精细粒度进行推测状态的系统中。在本文中，我们还提出了一种新的方法来减少处理跨线程数据依赖冲突的成本：运行时学习。硬件使用称为的新模块，硬件学会在似乎有可能触发压扁的情况下停止线程，并在不太可能触发压扁的情况下释放线程。 16处理器可扩展系统的仿真表明该方案非常有效。对于在系统级别上按行保持推测状态的协议，学习平均可消除84％的南瓜。生成的系统平均运行速度提高了43％，其性能非常接近具有完美预测的系统。

著录项

作者
Cintra, Marcelo Hehl.;
展开▼
作者单位

University of Illinois at Urbana-Champaign.;

展开▼
授予单位 University of Illinois at Urbana-Champaign.;
学科 Computer Science.
学位 Ph.D.
年度 2001
页码 108 p.
总页数 108
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:47:10

相似文献

外文文献
中文文献
专利

1. The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures [J] . Junior Loeff, Dalvan Griebler, Gabriele Mencagli, Future generation computer systems . 2021,第Deca期

机译：评估共享内存架构上的C ++并行编程框架的NAS并行基准
2. Parallelization Strategy for Elementary Morphological Operators on Graphs: Distance-Based Algorithms and Implementation on Multicore Shared-Memory Architecture [J] . Youkana Imane, Cousty Jean, Saouli Rachida, Journal of mathematical imaging and vision . 2017,第1期

机译：基本形态运算符对图中的并行化策略：基于距离的算法和多核共享内存架构的实现
3. A Comparative Study and Evaluation of Parallel Programming Models for Shared-Memory Parallel Architectures [J] . Luis Miguel SANCHEZ, Javier FERNANDEZ, Rafael SOTOMAYOR, New Generation Computing . 2013,第3期

机译：共享内存并行体系结构并行编程模型的比较研究和评估
4. Architectural support for scalable speculative parallelization in shared-memory multiprocessors [C] . Marcelo Cintra, Jose F. Martinez, Josep Torrellas Annual international symposium on Computer architecture;International symposium on Computer architecture . 2000

机译：共享内存多处理器中对可伸缩的推测并行化的架构支持
5. Performance bottlenecks on large-scale shared-memory multiprocessors. [D] . Kunz, Robert C. 2005

机译：大型共享内存多处理器的性能瓶颈。
6. A Parallel Architecture for the Partitioning around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare [O] . Hassan Mushtaq, Sajid Gul Khawaja, Muhammad Usman Akram, 2018

机译：围绕Medoids（PAM）算法进行分区的并行体系结构可实现可扩展的多核处理器及其在医疗保健中的应用
7. Architectural Support for Scalable Speculative Parallelization in Shared-Memory Multiprocessors [O] . Marcelo Cintra, Jose F. Martnez, Josep Torrellas 2000

机译：共享内存多处理器中可伸缩的推测并行化的体系结构支持
8. Alleviating Memory Contention in Matrix Computations on Large-Scale Shared-Memory Multiprocessors. [R] . Bianchini, R., Crovella, M. E., Kontothanassis, L. I., 1993

机译：减轻大规模共享内存多处理器矩阵计算中的内存争用。

Architectural support for scalable speculative parallelization in shared-memory multiprocessors.

摘要

著录项

相似文献

相关主题

期刊订阅