Parallel application memory scheduling

机译：并行应用程序内存调度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A primary use of chip-multiprocessor (CMP) systems is to speed up a single application by exploiting thread-level parallelism. In such systems, threads may slow each other down by issuing memory requests that interfere in the shared memory subsystem. This inter-thread memory system interference can significantly degrade parallel application performance. Better memory request scheduling may mitigate such performance degradation. However, previously proposed memory scheduling algorithms for CMPs are designed for multi-programmed workloads where each core runs an independent application, and thus do not take into account the inter-dependent nature of threads in a parallel application. In this paper, we propose a memory scheduling algorithm designed specifically for parallel applications. Our approach has two main components, targeting two common synchronization primitives that cause inter-dependence of threads: locks and barriers. First, the runtime system estimates threads holding the locks that cause the most serialization as the set of limiter threads, which are prioritized by the memory scheduler. Second, the memory scheduler shuffles thread priorities to reduce the time threads take to reach the barrier.We show that our memory scheduler speeds up a set of memory-intensive parallel applications by 12.6% compared to the best previous memory scheduling technique.

机译：芯片多处理器（CMP）系统的主要用途是通过利用线程级并行性来加速单个应用程序。在这样的系统中，线程可能会通过发出干扰共享内存子系统的内存请求而彼此降低速度。线程间内存系统的这种干扰会大大降低并行应用程序的性能。更好的内存请求调度可以减轻这种性能下降。但是，先前提出的用于CMP的内存调度算法是为多程序工作负载而设计的，其中每个内核都运行一个独立的应用程序，因此没有考虑并行应用程序中线程的相互依赖性质。在本文中，我们提出了一种专为并行应用程序设计的内存调度算法。我们的方法有两个主要组成部分，针对两个导致线程相互依赖的常见同步原语：锁和屏障。首先，运行时系统将持有最多导致序列化的锁的线程估计为限制器线程集，这些限制器由内存调度程序确定优先级。其次，内存调度程序会改组线程优先级，以减少线程到达障碍所需的时间。我们证明，与以前的最佳内存调度技术相比，我们的内存调度程序可将一组内存密集型并行应用程序的速度提高12.6％。

著录项

来源
《IEEE/ACM International Symposium on Microarchitecture》|2011年|362-373|共12页
会议地点
作者
Eiman Ebrahimi; Rustam Miftakhutdinov; Chris Fallin; Chang Joo Lee; José A. Joao; Onur Mutlu; Yale N. Patt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Instruction sets; Interference; Runtime; Synchronization; Random access memory; Estimation; Scheduling algorithms;

机译：指令集干扰运行时同步随机存取存储器估计调度算法;

相似文献

外文文献
中文文献
专利

1. A distributed chunk calculation approach for self-scheduling of parallel applications on distributed-memory systems [J] . Eleliemy Ahmed, Ciorba Florina M. Journal of computational science . 2021,第Apra期

机译：分布式块计算方法，用于自行调度分布式内存系统上的并行应用程序
2. NUMA-aware Scheduling and Memory Allocation for data-flow task-parallel Applications [J] . Drebes Andi, Pop Antoniu, Heydemann Karine, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第8期

机译：面向数据流任务并行应用程序的NUMA感知调度和内存分配
3. Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems [J] . Dounia Khaldi, Pierre Jouvelot, Corinne Ancourt Parallel Computing . 2015,第jana期

机译：与BDSC并行，一种用于共享和分布式内存系统的资源受限的调度算法
4. Adaptive memory paging for efficient gang scheduling of parallel applications [C] . Ryu, K.D., Pachapurkar, . 2004

机译：自适应内存分页，用于并行应用程序的有效组调度
5. The incorporation of memory mechanisms into the Meta-RaPS constructive phase with application to the unrelated parallel machine scheduling problem [D] . Zegarra-Ballon Burga, Diego 2009

机译：将内存机制并入Meta-RaPS建设性阶段，并应用于无关的并行机器调度问题
6. Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics [O] . Miguel Ruiz-Cabello N., Maksims Abaļenkovs, Luis M. Diaz Angulo, 2020

机译：共享和分布式内存架构并行FDTD方法的性能：应用脚踏电磁
7. A distributed chunk calculation approach for self-scheduling of parallel applications on distributed-memory systems [O] . Ahmed Eleliemy, Florina M. Ciorba 2021

机译：分布式块计算方法，用于自行调度分布式内存系统上的并行应用程序

Parallel application memory scheduling

摘要

著录项

相似文献

相关主题

期刊订阅