首页> 外文期刊>Concurrency and computation: practice and experience >Research and implementation of a high performance parallelrncomputing digital down converter on graphics processing unit
【24h】

Research and implementation of a high performance parallelrncomputing digital down converter on graphics processing unit

机译:图形处理单元上高性能并行计算数字下变频器的研究与实现

获取原文
获取原文并翻译 | 示例

摘要

Digital down converter (DDC) is a time‐intensive and data‐intensive computing task andrnconsidered as the key technology in software defined radio. This paper proposes a highperformancernimplementation of DDC on a graphics processing unit (GPU) using CUDA, whichrnis composed of a numerically controlled oscillator stage, a cascaded integrator‐comb (CIC)rndecimation filter stage, and a finite impulse response (FIR) filter stage. The GPU implementationrnand optimizing of all the stages are studied in detail. Additionally, for handling a long‐duration signal,rnthe signal data sequence is truncated into segments; the overlap‐save and overlap‐add mechanismsrnwere applied in CIC stage and FIR stage, respectively. Finally, experiments werernconducted to evaluate the performance of GPU‐based DDC with respect to a sequential versionrnCPU implementation and an OpenMP implementation (16 threads). Experimental results demonstraternthat the DDC achieves significant improvements on the GPU; the maximum speed ups inrnnumerically controlled oscillator stage, CIC stage, and FIR stage can achieve more than 1242,rn527, and 179 times, including data‐transfer, kernel execution, and other processing operations;rnthe overall speed up of DDC can achieve more than 180. In the meantime, the speed ups ofrnGPU implementation are far above the OpenMP implementation (about 2.5‐6.4 times).
机译:数字下变频器(DDC)是一项耗时且数据密集的计算任务,被视为软件无线电中的关键技术。本文提出了使用CUDA在图形处理单元(GPU)上实现DDC的高性能方法,该电路由数控振荡器级,级联积分梳状(CIC)抽取滤波器级和有限冲激响应(FIR)滤波器级组成。详细研究了各个阶段的GPU实现和优化。另外,为了处理长时间的信号,信号数据序列会被截断成段。在CIC阶段和FIR阶段分别应用了重叠保存和重叠添加机制。最后,进行了实验以评估基于GPU的DDC相对于顺序版本的CPU实现和OpenMP实现(16个线程)的性能。实验结果表明,DDC在GPU上取得了显着改进。数字控制振荡器级,CIC级和FIR级的最大提速可以达到1242,rn527和179倍以上,包括数据传输,内核执行和其他处理操作; DDC的总体提速可以达到180.同时,rnGPU实现的速度远远高于OpenMP实现(约2.5-6.4倍)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号