首页> 外文学位 >A low-cost high-speed twin-prefetching DSP-based shared-memory system for real-time image processing applications.
【24h】

A low-cost high-speed twin-prefetching DSP-based shared-memory system for real-time image processing applications.

机译:一种低成本的,基于DSP的高速双预取共享内存系统,用于实时图像处理应用程序。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation introduces, investigates, and evaluates a low-cost high-speed twin-prefetching DSP-based bus-interconnected shared-memory system for real-time image processing applications. The proposed architecture can effectively support 32 DSPs in contrast to a maximum of 4 DSPs supported by existing DSP-based bus-interconnected systems. This significant enhancement is achieved by introducing two small programmable fast memories (Twins) between the processor and the shared bus interconnect. While one memory is transferring data from/to the shared memory, the other is supplying the core processor with data. The elimination of the traditional direct linkage of the shared bus and processor data bus makes feasible the utilization of a wider shared bus i.e., shared bus width becomes independent of the data bus width of the processors. The fast prefetching memories and the wider shared bus provide additional bus bandwidth into the system, which eliminates large memory latencies; such memory latencies constitute the major drawback for the performance of shared-memory multiprocessors. Furthermore, in contrast to existing DSP-based uniprocessor or multiprocessor systems the proposed architecture does not require all data to be placed on on-chip or off-chip expensive fast memory in order to reach or maintain peak performance. Further, it can maintain peak performance regardless of whether the processed image is small or large.; The performance of the proposed architecture has been extensively investigated executing computationally intensive applications such as real-time high-resolution image processing. The effect of a wide variety of hardware design parameters on performance has been examined. More specifically tables and graphs comprehensively analyze the performance of 1, 2, 4, 8, 16, 32 and 64 DSP-based systems, for a wide variety of shared data interconnect widths such as 32, 64, 128, 256 and 512. In addition, the effect of the wide variance of temporal and spatial locality (present in different applications) on the multiprocessor's execution time is investigated and analyzed. Finally, the prefetching cache-size was varied from a few kilobytes to 4 Mbytes and the corresponding effect on the execution time was investigated. Our performance analysis has clearly showed that the execution time converges to a shallow minimum i.e., it is not sensitive to the size of the prefetching cache. The significance of this observation is that near optimum performance can be achieved with a small (16 to 300 Kbytes) amount of prefetching cache.
机译:本文介绍,研究和评估了一种用于实时图像处理应用的低成本高速双预取基于DSP的总线互连共享内存系统。与现有的基于DSP的总线互连系统最多支持4个DSP相比,所提出的体系结构可以有效地支持32个DSP。通过在处理器和共享总线互连之间引入两个小型可编程快速存储器(Twin),可以实现这一显着增强。当一个内存从共享内存传输数据到共享内存时,另一个则向核心处理器提供数据。消除共享总线和处理器数据总线的传统直接链接使得可行的是,可以使用更宽的共享总线,即,共享总线宽度变得独立于处理器的数据总线宽度。快速的预取存储器和更宽的共享总线为系统提供了额外的总线带宽,从而消除了大的存储延迟。这样的内存延迟构成了共享内存多处理器性能的主要缺点。此外,与现有的基于DSP的单处理器或多处理器系统相比,所提出的体系结构不需要将所有数据都放置在片上或片外昂贵的快速存储器上,即可达到或保持最佳性能。此外,无论处理的图像是大还是小,它都可以保持最佳性能。在执行诸如实时高分辨率图像处理之类的计算密集型应用程序时,已对提出的体系结构的性能进行了广泛的研究。已经检查了各种硬件设计参数对性能的影响。更具体地说,表格和图形针对各种共享数据互连宽度(例如32、64、128、256和512)全面分析了基于1,2,4,8,8,16,32和64 DSP的系统的性能。此外,还研究并分析了时间和空间局部性的广泛差异(存在于不同的应用程序中)对多处理器执行时间的影响。最后,预取缓存的大小从几千字节变化到4兆字节,并研究了对执行时间的相应影响。我们的性能分析清楚地表明,执行时间收敛到一个最小的最小值,即它对预取缓存的大小不敏感。该观察结果的意义在于,使用少量(16到300 KB)的预取缓存可以实现接近最佳的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号