Architectural support for thread communications in multi-core processors

Sevin Varoglu; Stephen Jenks

首页> 外文期刊>Parallel Computing >Architectural support for thread communications in multi-core processors

【24h】

Architectural support for thread communications in multi-core processors

机译：对多核处理器中的线程通信的体系结构支持

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the ongoing quest for greater computational power, efficiently exploiting parallelism is of paramount importance. Architectural trends have shifted from improving single-threaded application performance, often achieved through instruction level parallelism (ILP), to improving multithreaded application performance by supporting thread level parallelism (TLP). Thus, multi-core processors incorporating two or more cores on a single die have become ubiquitous. To achieve concurrent execution on multi-core processors, applications must be explicitly restructured to exploit parallelism, either by programmers or compilers. However, multithreaded parallel programming may introduce overhead due to communications among threads. Though some resources are shared among processor cores, current multi-core processors provide no explicit communications support for multithreaded applications that takes advantage of the proximity between cores. Currently, inter-core communications depend on cache coherence, resulting in demand-based cache line transfers with their inherent latency and overhead. In this paper, we explore two approaches to improve communications support for multithreaded applications. Prepushing is a software controlled data forwarding technique that sends data to destination's cache before it is needed, eliminating cache misses in the destination's cache as well as reducing the coherence traffic on the bus. Software Controlled Eviction (SCE) improves thread communications by placing shared data in shared caches so that it can be found in a much closer location than remote caches or main memory. Simulation results show significant performance improvement with the addition of these architecture optimizations to multi-core processors.

机译：在不断寻求更大的计算能力时，有效利用并行性至关重要。架构趋势已从提高单线程应用程序性能（通常通过指令级并行性（ILP）实现）转变为通过支持线程级并行性（TLP）来提高多线程应用程序性能。因此，在单个裸片上结合了两个或多个内核的多核处理器已经无处不在。为了在多核处理器上实现并发执行，必须由程序员或编译器对应用程序进行显式重构，以利用并行性。但是，由于线程之间的通信，多线程并行编程可能会带来开销。尽管一些资源在处理器内核之间共享，但是当前的多核处理器没有为利用内核之间接近性的多线程应用程序提供显式通信支持。当前，核心间通信取决于缓存一致性，导致基于需求的缓存行传输及其固有的延迟和开销。在本文中，我们探索了两种方法来改善对多线程应用程序的通信支持。预推送是一种软件控制的数据转发技术，可在需要之前将数据发送到目标的缓存，从而消除了目标缓存中的缓存未命中，并减少了总线上的一致性流量。通过将共享数据放置在共享缓存中，软件控制驱逐（SCE）改善了线程通信，因此与远程缓存或主内存相比，可以在更近的位置找到它。仿真结果表明，将这些架构优化添加到多核处理器中后，性能将得到显着改善。

著录项

来源
《Parallel Computing》 |2011年第1期|p.26-41|共16页
作者
Sevin Varoglu; Stephen Jenks;
展开▼
作者单位

Department of Electrical Engineering and Computer Science, University of California, Irvine, USA;

Department of Electrical Engineering and Computer Science, University of California, Irvine, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
multi-core processors; parallelism and concurrency; shared memory;

机译：多核处理器;并行与并发;共享内存;

相似文献

外文文献
中文文献
专利

1. Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications [J] . Chin-Long Wey, Shin-Yo Lin, Pei-Yun Tsai, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences . 2011,第7期

机译：用于混合SISO / MIMO OFDM无线通信的可重构同质多核FFT处理器体系结构
2. Design Methodology of the Heterogeneous Multi-core Processor With the Combination of Parallelized Multi-core Simulator and Common Register File-Based Instruction Set Extension Architecture [J] . Bingbing Xia, Fei Qiao, Huazhong Yang, Journal of Computers . 2013,第2期

机译：异构多核处理器的设计方法，具有并行化多核模拟器和基于公共寄存器文件指令集扩展架构的组合
3. Scalable processor architecture for Java with explicit thread support [J] . Buchenrieder K., Kress R. Electronics Letters . 1997,第18期

机译：具有显式线程支持的Java可扩展处理器体系结构
4. Scalable Triadic Analysis of Large-Scale Graphs: Multi-core vs. Multi-processor vs. Multi-threaded Shared Memory Architectures [C] . Chin Jr. George, Marquez Andres, Choudhury Sutanay, The 24th International Symposium on Computer Architecture and High Performance Computing. . 2012

机译：大型图形的可扩展三重分析：多核与多处理器与多线程共享内存体系结构
5. Architecture support for synchronization and communications in multi-core processors [D] . Fide, Sevin 2008

机译：对多核处理器中的同步和通信的体系结构支持
6. A Parallel Architecture for the Partitioning around Medoids (PAM) Algorithm for Scalable Multi-Core Processor Implementation with Applications in Healthcare [O] . Hassan Mushtaq, Sajid Gul Khawaja, Muhammad Usman Akram, 2018

机译：围绕Medoids（PAM）算法进行分区的并行体系结构可实现可扩展的多核处理器及其在医疗保健中的应用
7. Effective use of Multi-Core Architecture through Multi-Threading towards Computation Intensive Signal Processing Applications [O] . Prathmesh Deshmukh, Akhil Kurup, Shailesh.V.Kulkarni Shailesh.V.Kulkarni 2015

机译：通过对计算密集型信号处理应用的多线程有效地利用多核架构
8. Chip Multiprocessors Offer an Economical, Scalable Architecture for Future Microprocessors, Thread-Level Speculation Support Allows Them to Speed Up Past Software [R] . Hammond, L. , Hubbert, B. A. , Siu, M. , 2000

机译：芯片多处理器为未来的微处理器提供经济，可扩展的架构，线程级推测支持允许他们加速过去的软件

Architectural support for thread communications in multi-core processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅