首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck
【24h】

A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

机译:结合DMA和特定于应用程序的预取方法来解决内存延迟瓶颈

获取原文
获取原文并翻译 | 示例

摘要

Memory latency has always been a major issue in embedded systems that execute memory-intensive applications. This is even more true as the gap between processor and memory speed continues to grow. Hardware and software prefetching have been shown to be effective in tolerating the large memory latencies inherit in large off-chip memories; however, both types of prefetching have their shortcomings. Hardware schemes are more complex and require extra circuitry to compute data access strides, while software schemes generate prefetch instructions, which if not computed carefully may hamper performance. On the other hand, some applications domains (such as multimedia) have a uniform and known a priori memory access pattern, that if exploited, could yield significant application performance improvement. With this characteristic in mind, we present our findings on hiding memory latency using the direct memory access (DMA) mode, which is present in all modern systems, combined with a software prefetch mechanism, and a customized on-chip memory hierarchy mapping. Compared to previous approaches, we are able to estimate the performance and power metrics, without actually implementing the embedded system. Experimental results on nine well known multimedia and imaging applications prove the efficiency of our technique. Finally, we verify the performance estimations by implementing and simulating the algorithms on the TI C6201 processor.
机译:内存延迟一直是执行内存密集型应用程序的嵌入式系统中的主要问题。随着处理器和内存速度之间的差距不断扩大,这一点更加真实。硬件和软件预取已被证明可以有效地容忍大片外存储器中继承的大存储延迟。但是,两种类型的预取都有其缺点。硬件方案更加复杂,需要额外的电路来计算数据访问步幅,而软件方案会生成预取指令,如果不仔细计算可能会影响性能。另一方面,某些应用程序域(例如多媒体)具有统一的已知先验内存访问模式,如果被利用,则可以显着提高应用程序性能。考虑到这一特征,我们介绍了使用直接内存访问(DMA)模式隐藏内存延迟的发现,该模式在所有现代系统中都存在,并结合了软件预取机制和定制的片上内存层次结构映射。与以前的方法相比,我们能够估计性能和功耗指标,而无需实际实现嵌入式系统。在九种众所周知的多媒体和影像应用程序上的实验结果证明了我们技术的效率。最后,我们通过在TI C6201处理器上实施和仿真算法来验证性能估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号