首页> 外文会议>International conference on computer design >Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

【24h】

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

机译：具有TLB预取和MMU感知DMA引擎的异构SoC中的可扩展且高效的虚拟内存共享

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4~ and by up to 60% for irregular and regular memory access patterns, respectively.

机译：共享虚拟内存（SVM）是异构片上系统（SoC）的关键，异构片上系统将通用主机处理器与多核加速器结合在一起，既可实现可编程性，又可避免数据重复。但是，当缺少转换后备缓冲区（TLB）条目时，SVM可能会带来很大的运行时间开销。此外，传统上允许DMA突发传输写入SVM需要缓冲区来吸收TLB中未命中的传输。必须为最大突发大小而过度配置这些缓冲区，浪费宝贵的片上存储器，并在所有SVM访问已满时停止所有访问，从而阻碍了并行加速器的可扩展性。在这项工作中，我们展示了我们的SVM解决方案，该方案可避免大多数预取导致的TLB遗漏，支持并行突发DMA传输而无需额外的缓冲区，并且可以根据工作负载和并行处理器数量进行扩展。我们的解决方案基于三个新颖的概念：为了最大程度地减少TLB遗漏的发生率，TLB由编译器生成的Prefetching Helper Threads主动填充，该线程使用运行时信息及时发布预取信息。为了减少TLB丢失的等待时间，可通过可变数量的并行“小姐处理助手”线程来处理丢失。为了在不使用额外缓冲区的情况下支持并行突发DMA到SVM的传输，我们将轻量级硬件添加到标准DMA引擎中，以检测TTL丢失并对之做出反应。与现有技术相比，我们的工作将内存密集型内核的加速器性能分别提高了4％，对于不规则和常规内存访问模式分别提高了60％。

著录项

来源
《International conference on computer design》|2018年|292-300|共9页
会议地点
作者
Andreas Kurth; Pirmin Vogel; Andrea Marongiu; Luca Benini;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Prefetching; Hardware; Support vector machines; Engines; System-on-chip;

机译：预取;硬件;支持向量机;引擎;片上系统;

相似文献

外文文献
专利

1. Rethinking TLB Designs in Virtualized Environments: A Very Large Part-of-Memory TLB [J] . Jee Ho Ryoo, Nagendra Gulur, Shuang Song, Computer architecture news . 2017,第2期

机译：重新思考虚拟化环境中的TLB设计：很大一部分内存TLB
2. Scalability Analysis of Memory Consistency Models in NoC-Based Distributed Shared Memory SoCs [J] . Naeem A., Jantsch A., Lu Z. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on . 2013,第5期

机译：基于NoC的分布式共享内存SoC中内存一致性模型的可伸缩性分析
3. TLB Update-Hint: A Scalable TLB Consistency Algorithm for Cache-Coherent Non-uniform Memory Access Multiprocessors [J] . Byeonghag SEONG, Donggook KIM, Yangwoo ROH, IEICE Transactions on Information and Systems . 2004,第7期

机译：TLB更新提示：用于缓存一致的非均匀内存访问多处理器的可扩展TLB一致性算法
4. Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine [C] . Andreas Kurth, Pirmin Vogel, Andrea Marongiu, IEEE International Conference on Computer Design . 2018

机译：具有TLB预取和MMU感知DMA引擎的异构SoC中的可扩展和高效的虚拟内存共享
5. Performance evaluation of TLB consistency solutions in large-scale shared-memory multiprocessors with consistent caches. [D] . Maydeo, Ketan A. 2005

机译：具有一致的高速缓存的大型共享内存多处理器中TLB一致性解决方案的性能评估。
6. Performance and scaling behavior of bioinformatic applications in virtualization environments to create awareness for the efficient use of compute resources [O] . Maximilian Hanussek, Felix Bartusch, Jens Krüger, 2021

机译：虚拟化环境中生物信息应用程序的性能和缩放行为为有效使用计算资源创造认识
7. Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine [O] . Andreas Kurth, Pirmin Vogel, Andrea Marongiu, 2018

机译：具有TLB预取和MMU感知DMA引擎的异构SoC中的可扩展和高效的虚拟内存共享
8. Effectiveness of Caches and Data Prefetch Buffers in Large-Scale Shared Memory Multiprocessors [R] . Lee, R. L. 1987

机译：大规模共享存储器多处理器中高速缓存和数据预取缓冲区的有效性

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

摘要

著录项

相似文献

相关主题

期刊订阅