ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers

机译：Atlas：多存储器控制器的可扩展和高性能调度算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control access to main memory. The scheduling algorithm employed by these memory controllers has a significant effect on system throughput, so choosing an efficient scheduling algorithm is important. The scheduling algorithm also needs to be scalable - as the number of cores increases, the number of memory controllers shared by the cores should also increase to provide sufficient bandwidth to feed the cores. Unfortunately, previous memory scheduling algorithms are inefficient with respect to system throughput and/or are designed for a single memory controller and do not scale well to multiple memory controllers, requiring significant fine-grained coordination among controllers. This paper proposes ATLAS (Adaptive per-Thread Least-Attained-Service memory scheduling), a fundamentally new memory scheduling technique that improves system throughput without requiring significant coordination among memory controllers. The key idea is to periodically order threads based on the service they have attained from the memory controllers so far, and prioritize those threads that have attained the least service over others in each period. The idea of favoring threads with least-attained-service is borrowed from the queueing theory literature, where, in the context of a single-server queue it is known that least-attained-service optimally schedules jobs, assuming a Pareto (or any decreasing hazard rate) workload distribution. After verifying that our workloads have this characteristic, we show that our implementation of least-attained-service thread prioritization reduces the time the cores spend stalling and significantly improves system throughput. Furthermore, since the periods over which we accumulate the attained service are long, the controllers coordinate very infrequently to form the ordering of threads, thereby making ATLAS scalable to many controllers. We evaluate ATLAS on a wide variety of multiprogrammed SPEC 2006 workloads and systems with 4-32 cores and 1-16 memory controllers, and compare its performance to five previously proposed scheduling algorithms. Averaged over 32 workloads on a 24-core system with 4 controllers, ATLAS improves instruction throughput by 10.8%, and system throughput by 8.4%, compared to PAR-BS, the best previous CMP memory scheduling algorithm. ATLAS's performance benefit increases as the number of cores increases.

机译：现代芯片多处理器（CMP）系统采用多个存储器控制器来控制访问主存储器。由这些存储器控制器所使用的调度算法对系统吞吐量的影响显著，因此选择一个有效的调度算法是重要的。调度算法也需要可伸缩的 - 随着内核数量的增加，由核共享存储器控制器的数量也要增加，以提供足够的带宽来喂芯。不幸的是，先前的存储器调度算法是低效的吞吐量相对于系统和/或设计用于在单个存储器控制器，并且不很好地扩展到多个存储器控制器，要求控制器当中显著细粒度协调。本文提出ATLAS（自适应每线程最不达到-服务存储器调度），改善系统吞吐量，而不需要存储器控制器当中的协调显著一个全新的存储器调度技术。关键的想法是基于他们从内存控制器获得迄今为止服务定期秩序线程，并且已经获得了至少在服务他人中的每个时期这些线程的优先级。利于螺纹与至少-获得服务被从排队理论文献，其中，在单一服务器的上下文队列，已知至少-实现服务最优调度作业借用，假设帕累托的想法（或任何降低危险率）工作量分配。验证我们的工作量有这个特点后，我们证明了我们的最低实现服务线程优先级的实施降低了核花费拖延时间，显著提高了系统的吞吐量。此外，由于在我们积累获得服务的时间很长，控制器协调极少形成线程的排序，从而使ATLAS扩展到多个控制器。我们评估在各种多程序2006年SPEC工作负载，并与4-32内核和1-16内存控制器系统ATLAS，并比较其性能到五年之前提出的调度算法。平均超过与4个控制器24核系统上的工作负载32，ATLAS提高了10.8％的吞吐量指令，并且系统吞吐量8.4％，相比之下，PAR-BS，最好先前CMP存储器调度算法。 ATLAS的性能优势随着的内核数量的增加。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2010年||共12页
会议地点
作者
Yoongu Kim; Dongsu Han; Onur Mutlu; Mor Harchol-Balter;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Exploring the Spectrum of Dynamic Scheduling Algorithms for Scalable Distributed-MemoryRay Tracing [J] . Navratil P.A., Childs H., Fussell D.S., IEEE transactions on visualization and computer graphics . 2014,第6期

机译：探索可扩展分布式内存射线跟踪的动态调度算法的频谱
2. Improved and Competitive Algorithms for Large Scale Multiple Resource-Constrained Project-Scheduling Problems [J] . Mohammad Rostami, Dariush Moradinezhad, Azadeh Soufipour KSCE journal of civil engineering . 2014,第5期

机译：大规模多资源约束项目调度问题的改进竞争算法
3. Potential of hyphenated ultra-high performance liquid chromatography-scheduled multiple reaction monitoring algorithm for large-scale quantitative analysis of traditional Chinese medicines [J] . Song Qingqing, Song Yuelin, Zhang Na, RSC Advances . 2015,第71期

机译：联用超高效液相色谱-定时多反应监测算法在中药大规模定量分析中的潜力
4. ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers [C] . Yoongu Kim, Dongsu Han, Onur Mutlu, 2010 IEEE 16th international symposium on high performance computer architecture . 2010

机译：ATLAS：一种用于多存储控制器的可扩展且高性能的调度算法
5. Design of a smart non-volatile memory controller: Architecture modeling, systems analysis, parallel I/O processing and scheduling algorithms. [D] . Jung, Myoungsoo. 2013

机译：智能非易失性存储器控制器的设计：体系结构建模，系统分析，并行I / O处理和调度算法。
6. Preserved Central Memory and Activated Effector Memory CD4+ T-Cell Subsets in Human Immunodeficiency Virus Controllers: an ANRS EP36 Study [O] . Simon J. Potter, Christine Lacabaratz, Olivier Lambotte, 2007

机译：在人类免疫缺陷病毒控制器中保留的中央记忆和激活的效应记忆CD4 + T细胞亚集：ANRS EP36研究
7. ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers [O] . Kim, Yoongu, Han, Dongsu, Mutlu, Onur, 2010

机译：ATLAS：用于多存储控制器的可扩展且高性能的调度算法
8. Analysis of Multiple-Queue Task Scheduling Algorithms for Multiple-SIMD Machines [R] . Tuomenoksa, D. L., Siegel, H. J. 1982

机译：多sImD机器的多队列任务调度算法分析

ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers

摘要

著录项

相似文献

相关主题

期刊订阅