首页> 外文会议>International conference on computer design >Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
【24h】

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

机译:具有TLB预取和MMU感知DMA引擎的异构SoC中的可扩展且高效的虚拟内存共享

获取原文

摘要

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper Threads, which use run-time information to issue timely prefetches. To reduce the latency of TLB misses, misses are handled by a variable number of parallel Miss Handling Helper Threads. To support parallel burst DMA transfers to SVM without additional buffers, we add lightweight hardware to a standard DMA engine to detect and react to TLB misses. Compared to the state of the art, our work improves accelerator performance for memory-intensive kernels by up to 4~ and by up to 60% for irregular and regular memory access patterns, respectively.
机译:共享虚拟内存(SVM)是异构片上系统(SoC)的关键,异构片上系统将通用主机处理器与多核加速器结合在一起,既可实现可编程性,又可避免数据重复。但是,当缺少转换后备缓冲区(TLB)条目时,SVM可能会带来很大的运行时间开销。此外,传统上允许DMA突发传输写入SVM需要缓冲区来吸收TLB中未命中的传输。必须为最大突发大小而过度配置这些缓冲区,浪费宝贵的片上存储器,并在所有SVM访问已满时停止所有访问,从而阻碍了并行加速器的可扩展性。在这项工作中,我们展示了我们的SVM解决方案,该方案可避免大多数预取导致的TLB遗漏,支持并行突发DMA传输而无需额外的缓冲区,并且可以根据工作负载和并行处理器数量进行扩展。我们的解决方案基于三个新颖的概念:为了最大程度地减少TLB遗漏的发生率,TLB由编译器生成的Prefetching Helper Threads主动填充,该线程使用运行时信息及时发布预取信息。为了减少TLB丢失的等待时间,可通过可变数量的并行“小姐处理助手”线程来处理丢失。为了在不使用额外缓冲区的情况下支持并行突发DMA到SVM的传输,我们将轻量级硬件添加到标准DMA引擎中,以检测TTL丢失并对之做出反应。与现有技术相比,我们的工作将内存密集型内核的加速器性能分别提高了4%,对于不规则和常规内存访问模式分别提高了60%。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号