Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads

Teo Milanez; Sylvain Collange; Fernando Magno Quintao Pereira; Wagner Meira Jr.; Renato Ferreira

首页> 外文期刊>Parallel Computing >Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads

【24h】

Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads

机译：用于SPMD工作负载的动态矢量化的线程调度和内存合并

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Simultaneous Multi-Threading (SMT) is a hardware model in which different threads share the same processing unit. This model is a compromise between high parallelism and low hardware cost. Minimal Multi-Threading (MMT) is one architecture recently proposed that shares instruction decoding and execution between threads running the same program in an SMT processor, thereby generalizing the approach followed by Graphics Processing Units to general-purpose processors. In this paper we propose new ways to expose redundancies in the MMT execution model. First, we propose and evaluate a new thread recon-vergence heuristic that handles function calls better than previous approaches. Our heuristic only inspects the program counter and the stack frame to reconverge threads; hence, it is amenable to efficient and inexpensive hardware implementation. Second, we demonstrate that this heuristic is able to reveal the existence of substantial regularity in inter-thread memory access patterns. We validate our results on data-parallel applications from the PARSEC and SPLASH suites. Our new reconvergence heuristic increases the throughput of our MMT model by 7%, when compared to a previous, and substantially more complex approach, due to Long et al. Moreover, it gives us an effective way to increase regularity in memory accesses. We have observed that over 70% of simultaneous memory accesses are either the same for all the threads, or are affine expressions of the thread identifier. This observation motivates the design of newly proposed hardware that benefits from regularity in inter-thread memory accesses.

机译：同步多线程（SMT）是一种硬件模型，其中不同的线程共享同一处理单元。该模型是高并行度和低硬件成本之间的折衷方案。最小多线程（MMT）是最近提出的一种架构，该架构在SMT处理器中运行相同程序的线程之间共享指令解码和执行，从而将图形处理单元遵循的方法推广到通用处理器。在本文中，我们提出了在MMT执行模型中公开冗余的新方法。首先，我们提出并评估了一种新的线程重新融合启发式方法，该方法比以前的方法能更好地处理函数调用。我们的启发式方法仅检查程序计数器和堆栈框架以重新收敛线程；因此，可以实现高效且廉价的硬件实现。其次，我们证明了这种启发式方法能够揭示线程间内存访问模式中存在实质性规律性。我们在PARSEC和SPLASH套件的数据并行应用程序中验证了我们的结果。由于Long等人的研究，与先前且实质上更复杂的方法相比，我们新的重新收敛启发式算法将MMT模型的吞吐量提高了7％。此外，它为我们提供了一种增加内存访问规则性的有效方法。我们已经观察到，超过70％的同时进行的内存访问对于所有线程都是相同的，或者是线程标识符的仿射表达式。该观察结果激励了新设计的硬件的设计，该设计得益于线程间内存访问的规律性。

著录项

来源
《Parallel Computing》 |2014年第9期|548-558|共11页
作者
Teo Milanez; Sylvain Collange; Fernando Magno Quintao Pereira; Wagner Meira Jr.; Renato Ferreira;
展开▼
作者单位

Av. Antonio Carlos 6, 627, ICEx, CEP 31270-010, Belo Horizonte, Brazil;

Campus de Beaulieu, 35042 Rennes Cedex, France;

Av. Antonio Carlos 6, 627, ICEx, CEP 31270-010, Belo Horizonte, Brazil;

Av. Antonio Carlos 6, 627, ICEx, CEP 31270-010, Belo Horizonte, Brazil;

Av. Antonio Carlos 6, 627, ICEx, CEP 31270-010, Belo Horizonte, Brazil;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Computer Architecture; Parallelism; SIMD; SMT; MMT; High performance;

机译：计算机架构;平行信德省一对;垫;高性能;

相似文献

外文文献
中文文献
专利

1. Compass SPMD: a SPMD vectorized tracking algorithm [J] . Placido Fernandez Declara, J. Daniel Garcia EPJ Web of Conferences . 2020,第4期

机译：指南针SPMD：SPMD矢量化跟踪算法
2. Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs [J] . Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin, International journal of grid and high performance computing . 2014,第1期

机译：动态调整线程并行度以加快GPGPU上不规则工作负载分布的动态编程
3. Selecting threads for workload migration in software distributed shared memory systems [J] . Tyng-Yeu Liang, Ce-Kuen Shieh, Jun-Qi Li Parallel Computing . 2002,第6期

机译：在软件分布式共享内存系统中选择用于工作负载迁移的线程
4. Supporting a dynamic SPMD in a multi-threaded architecture [C] . Hum, H.H.J., Gao, . 1993

机译：在多线程体系结构中支持动态SPMD
5. Adaptive dynamic thread scheduling for simultaneous multithreaded architectures with a detector thread. [D] . Shin, Chulho. 2002

机译：具有检测器线程的同时多线程体系结构的自适应动态线程调度。
6. Cognitive Workload and Workload Transitions Elicit Curvilinear Hemodynamics During Spatial Working Memory [O] . Ryan McKendrick, Amanda Harwood 2019

机译：认知工作量和工作量转变在空间工作记忆中引起曲线血流动力学
7. Table 5: Performance of proposed GPU-based parallel implementation of permutation testing depending on whether memory coalescing technique was used (the number of CUDA blocks = 16, the number of threads per block = 256). [O] . -1

机译：表5：根据使用内存聚结技术是否使用基于GPU的平行实施的性能（CUDA块的数量= 16，每个块= 256的线数）。

Thread scheduling and memory coalescing for dynamic vectorization of SPMD workloads

摘要

著录项

相似文献

相关主题

期刊订阅