首页> 外文会议>ACM international conference on Multimedia >Exploring multimedia applications locality to improve cache performance
【24h】

Exploring multimedia applications locality to improve cache performance

机译:探索多媒体应用程序的本地性以提高缓存性能

获取原文

摘要

This research aims to explore possible solutions to improvement of performance in multimedia processor [1]. In this context, cache memory performance plays a more and more critical role in computer systems, since the gap between processor speed and main memory speed tends to increase rather than the contrary. The integration inside the computational units of some SIMD improvements (such as Pentium MMX, HP MAX2 or UltraSparc VIS) for improving the parallel computation on image pixels is the main answer to the heavy workloads of multimedia applications [2]. Moreover, the workload of multimedia applications [3] has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In fact, as widely known, programs exhibit two main kind of locality: spatial and temporal. Nevertheless, as stated in [1], multimedia applications seem to present a new kind of locality, called 2D-spatial locality(i.e. there is an high probability that accessing to an address, future accesses will be in a bidimensional neighborhood of it). For this reason, standard cache memory organization achieves poorer performance when used for multimedia. To achieve an overall performance improvement on specialized multimedia processors, further architectural modification on memory hierarchy and on its management should be fulfilled. This could be coupled with the recent idea of associating programmable components with memory separated from the main processor, such as IRAM [4].

First goal of this research is to prove that common multimedia applications exhibit a 2D-spatial locality. To do this, we developed a benchmark including the most common multimedia and image processing applications. Many trace-driven simulations confirm the hypothesis [5][6].

After this, we try to explore techniques able to exploit this locality to improve cache performance. Among the various techniques used to improve cache memory performance, prefetching has been one of the most studied and apparently promising (see [7][8], where, however, no assumption on 2D spatial locality is highlighted). Prefetching techniques can be mainly classified according to their potential software or hardware implementation, although some techniques may take advantage of a combined software/hardware implementation [9]. A widely explored approach to improve cache performance is hardware prefetching that allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches partially miss the potential performance improvement, since they are not tailored to multimedia locality. In this research we are proposing novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. In particular, we have addressed multimedia image processing, where we have included algorithms like the widespread MPEG-2 decoding used for decompression of audio/video streams and typical image processing operations like convolution for image filtering and edge chain coding, used as a pre-processing step in many image analysis tasks. We have omitted evaluation on sound data (like MP3 decompression or speech recognition), since they exhibit typical array spatial locality and standard prefetching techniques perform well enough. Algorithms have been selected according to their spread and their different data addressing schemes: while convolution is dominated by a regular data addressing scheme which can be predicted a priori, edge chain coding is heavily data dependent, in the sense that the address sequence of data references depends on the image and cannot be statically predicted: for example, in this case software prefetching techniques (based on compile-time prediction of future accesses) are not suitable. MPEG-2 exhibits a combination of regular address scheme and data dependency.

Typical hardware prefetching techniques are not suitable in this context: techniques based on one-block-lookahead [10] exploit only 1D spatial locality, while adaptive techniques do not match data dependency of some image processing algorithms.

机译:

本研究旨在探索可能的解决方案,以提高多媒体处理器的性能 [1]。在这种情况下,高速缓存存储器的性能在计算机系统中起着越来越重要的作用,因为处理器速度和主存储器速度之间的差距趋于增加而不是相反。一些SIMD改进(例如Pentium MMX,HP MAX2或UltraSparc VIS)的计算单元内部的集成,用于改进图像像素的并行计算,这是多媒体应用程序繁重工作量的主要解决方案[2]。此外,多媒体应用程序的工作量[3]对高速缓存的性能有很大的影响,因为嵌入在多媒体程序中的内存引用的位置不同于传统程序。实际上,众所周知,程序表现出两种主要的局部性:空间性和时间性。但是,如[1]中所述,多媒体应用程序似乎呈现了一种新的位置,称为 2D空间位置(即很有可能访问地址,将来的访问将在它的二维邻域)。因此,标准的高速缓存存储器组织在用于多媒体时会获得较差的性能。为了在专用多媒体处理器上实现整体性能的提高,应该对内存层次结构及其管理进行进一步的架构修改。这可能与最近的想法相联系,即将可编程组件与与主处理器分离的内存相关联,例如IRAM [4]。

该研究的首要目标是证明常见的多媒体应用程序具有2D空间局部性。为此,我们制定了基准测试,其中包括最常见的多媒体和图像处理应用程序。许多跟踪驱动的仿真证实了这一假设[5] [6]。

此后,我们尝试探索能够利用此局部性来提高缓存性能的技术。在用于提高高速缓存存储器性能的各种技术中,预取一直是研究最多的技术之一,并且显然是有前途的(请参见[7] [8],其中未突出显示关于2D空间局部性的假设)。预取技术主要可以根据其潜在的软件或硬件实现方式进行分类,尽管某些技术可以利用组合的软件/硬件实现方式[9]。改善硬件性能的一种广泛探索的方法是硬件预取,它允许在引用数据之前将数据预加载到高速缓存中。但是,现有的硬件预取方法部分地未实现潜在的性能改进,因为它们并非针对多媒体本地定制的。在这项研究中,我们提出了一种新的有效的硬件预取方法,以用于多媒体图像处理程序。特别是,我们已经解决了多媒体图像处理的问题,其中包括诸如用于音频/视频流解压缩的广泛的MPEG-2解码之类的算法,以及作为图像过滤和边缘链编码的卷积之类的典型图像处理操作(作为预许多图像分析任务中的处理步骤。我们已经省略了对声音数据的评估(例如MP3解压缩或语音识别),因为它们表现出典型的数组空间局部性,并且标准的预取技术表现良好。已经根据算法的扩展和不同的数据寻址方案选择了算法:虽然卷积由可以先验地预测的常规数据寻址方案主导,但从某种意义上说,边缘链编码在很大程度上取决于数据数据引用的地址序列取决于图像并且不能静态预测:例如,在这种情况下,软件预取技术(基于对将来访问的编译时预测)不适合。 MPEG-2结合了常规地址方案和数据依赖性。

典型的硬件预取技术不适用于这种情况:基于单块超前[10]的技术仅利用1D空间局部性,而自适应技术则不匹配某些图像处理算法的数据依赖性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号