首页> 外文会议>International Conference on Parallel Processing >A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors
【24h】

A Novel Minimum Time Parallel 2-D Discrete Wavelet Transform Algorithm for General Purpose Processors

机译:一种新的通用处理器的最小时间并行二维离散小波变换算法

获取原文

摘要

A novel efficient inplace, multithreaded, and cache-friendly parallel 2-D wavelet transform algorithm based on the lifting transform is introduced. In order to maximize the cache utilization and consequently minimize the memory bus bandwidth use, the threads compete to work on a small memory area maximizing the chance of finding it in the cache and their synchronization is done with very low overhead without the use of any locks and relying solely on the basic compare-and-swap (CAS) atomic primitive. An implementation in the C programming language with and without the use of vector (single instruction multiple data - SIMD) instructions is provided for both single (serial) and multi (parallel) threaded single-loop DWT implementations as well as serial and parallel naive implementations using linear (row order) and strided (column order) memory access patterns for comparison. Results show a significant improvement over the single-threaded optimized implementation and a much greater improvement over both the single and multi threaded naive implementations, reaching minimum running time depending on the number of processor cores and the available memory bus bandwidth, i.e., it becomes memory bound using the minimum number of memory accesses. Given the simplicity and high speed of the lifting steps, an analysis based on the number of memory bus operations (read and write) is done for images that are larger than twice the shared cache size which establishes a lower bound for the running time of all linear memory access algorithms and also determines the maximum speed gains to be expected in relation to currently implemented parallel schemes based on the parallel execution of independent lifting steps. It also shows the optimality of the parallel algorithm presented. Finally, a comparison with currently available implementations shows the gains achieved by the proposed algorithm.
机译:介绍了一种基于升降变换的新型高效的ILPLACE,多线程和高速缓冲友好的并联2-D小波变换算法。为了最大化高速缓存利用率,并因此最小化存储器总线带宽使用,线程竞争为在小的内存区域上工作,最大化在高速缓存中找到它的机会,并且在不使用任何锁的情况下使用非常低的开销进行它们的同步并仅依赖于基本比较和交换(CAS)原子原语。为单个(串行)和多(并行)螺纹单环DWT实现以及串行和并行Naive实现提供了C编程语言的C编程语言(单指令多数据 - SIMD)指令的实施使用线性(行顺序)和string(列顺序)存储器访问模式进行比较。结果显示对单线程优化实现的显着改进,以及对单个和多线程天真的实现的更大改进,取决于处理器内核的数量和可用的内存总线带宽,即它变为内存使用最小内存访问数量的绑定。鉴于提升步骤的简单性和高速,基于存储总线操作数(读写)的分析是为大于共享高速缓存大小的两倍的图像完成的,该图像为所有的运行时间建立下限线性存储器访问算法并根据基于独立提升步骤的并行执行,确定关于当前实现的并行方案的最大速度增益。它还显示了所呈现的并行算法的最优性。最后,与目前可用的实现的比较显示了所提出的算法实现的增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号