Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

机译：最少的多线程：在多线程处理器中查找和删除冗余指令

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Parallelism is the key to continued performance scaling in modern microprocessors. Yet we observe that this parallelism can often contain a surprising amount of instruction redundancy. We propose to exploit this redundancy to improve performance and decrease energy consumption. We propose a multi-threading micro-architecture, Minimal Multi-Threading (MMT), that leverages register renaming and the instruction window to combine the fetch and execution of identical instructions between threads in SPMD applications. While many techniques exploit intra-thread similarities by detecting when a later instruction may use an earlier result, MMT exploits inter-thread similarities by, whenever possible, fetching instructions from different threads together and only splitting them if the computation is unique. With two threads, our design achieves a speedup of 1.15(geometric mean) over a two-thread traditional SMT with a trace cache. With four threads, our design achieves a speedup of 1.25 (geometric mean) over a traditional SMT processor with four-threads and a trace cache. These correspond to speedups of 1.5 and 1.84 over a traditional out-of-order processor. Moreover, our performance increases inmost applications with no power increase because the increase in overhead is countered with a decrease in cache accesses, leading to a decrease in energy consumption for all applications.

机译：并行性是持续扩展现代微处理器性能的关键。但是我们观察到，这种并行性通常可能包含令人惊讶的指令冗余量。我们建议利用这种冗余来提高性能并减少能耗。我们提出了一种多线程微体系结构，即最小多线程（MMT），它利用寄存器重命名和指令窗口来组合SPMD应用程序中线程之间相同指令的获取和执行。尽管许多技术通过检测何时一条较晚的指令可以使用较早的结果来利用线程内的相似性，但是MMT尽可能地从不同线程中提取指令，并仅在计算唯一时才对它们进行拆分，从而利用线程间的相似性。通过两个线程，我们的设计比带有跟踪缓存的两个线程传统SMT的速度提高了1.15（几何平均值）。与具有四个线程和跟踪缓存的传统SMT处理器相比，我们的设计具有四个线程，可实现1.25（几何平均值）的加速。这些对应于传统乱序处理器的1.5和1.84的加速。此外，我们的性能在不增加功率的情况下提高了大多数应用程序的性能，因为开销的增加与高速缓存访问的减少相抵消，从而导致所有应用程序的能耗降低。

著录项

来源
《Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture》|2010年|p.337-348|共12页
会议地点
作者
Long Guoping; Franklin Diana; Biswas Susmit; Ortiz Pablo; Oberg Jason; Fan Dongrui; Chong Frederic T.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类总体结构、系统结构;
关键词
Instruction Redundancy; Parallel Processing; SMT;

机译：指令冗余;并行处理; SMT;

相似文献

外文文献
中文文献
专利

1. Recalling instructions from idling threads to maximize resource utilization for simultaneous multi-threading processors [J] . Yilin Zhang, Caleb Douglas, Wei-Ming Lin Computers and Electrical Engineering . 2013,第7期

机译：从空闲线程中调用指令以最大程度地利用资源以同时使用多线程处理器
2. Adaptive instruction dispatching techniques for Simultaneous Multi-Threading (SMT) processors [J] . Monobrata Debnath, Wei-Ming Lin, Eugene John Computers and Electrical Engineering . 2012,第6期

机译：同步多线程（SMT）处理器的自适应指令调度技术
3. An instruction-systolic programmable shader architecture for multi-threaded 3D graphics processing [J] . Jung-Wook Park, Hoon-Mo Yang, Gi-Ho Park, Journal of Parallel and Distributed Computing . 2010,第11期

机译：用于多线程3D图形处理的指令收缩式可编程着色器体系结构
4. Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors [C] . Long Guoping, Franklin Diana, Biswas Susmit, Annual IEEE/ACM International Symposium on Microarchitecture . 2010

机译：最小多线程：在多线程处理器中查找和删除冗余指令
5. A CORBA-based distributed and multi-threaded algorithm for finding related records in a large data set [D] . Hayes, Donald L. 2008

机译：基于CORBA的分布式和多线程算法，用于在大型数据集中查找相关记录
6. Low-latency multi-threaded processing of neuronal signals for brain-computer interfaces [O] . Jörg Fischer, Tomislav Milekovic, Gerhard Schneider, 2014

机译：脑计算机接口的神经元信号的低延迟多线程处理
7. Minimal Multi-Threading: Finding and Removing Redundant Instructions in Multi-Threaded Processors [O] . Guoping Long, Diana Franklin, Susmit Biswas, 2013

机译：最小多线程：在多线程处理器中查找和删除冗余指令

Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

摘要

著录项

相似文献

相关主题

期刊订阅