Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

Rongchun Li; Yong Dou; Dan Zou

首页> 外文期刊>Concurrency and computation: practice and experience >Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

【24h】

Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

机译：在CPU，GPU和FPGA上高效并行实现三点维特比解码算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In wireless communication, Viterbi decoding algorithm (VDA) is the one of most popular channel decodingrnalgorithms, which is widely used in WLAN, WiMAX, or 3G communications. However, the throughput ofrnViterbi decoder is constrained by the convolutional characteristic. Recently, the three-point VDA (TVDA)rnwas proposed to solve this problem. In TVDA, the whole procedure can be divided into three phases, thernforward, trace-back, and decoding phases. In this paper, we analyze the parallelism of TVDA and proposernparallel TVDA on the multi-core CPU, graphics processing unit (GPU), and field programmable gate arrayrn(FPGA). We demonstrate approaches that fully exploit its performance potential on CPU, GPU, and FPGArncomputing platforms. For CPU platforms, we perform two optimization methods, single instruction multiplerndata and multithreading to gain over 145× speedup over the naive CPU version on a quad-core CPU platform.rnFor GPU platforms, we propose the combination of cached memory optimization, coalesced globalrnmemory accesses, codeword packing scheme, and asynchronous data transition, achieving the throughputrnof 404.65 Mbps and 12× speedup over initial GPU versions on an NVIDIA GeForce GTX580 card andrn7× speedup over Intel quad-core CPU i5-2300, under the same manufacturing year and both with fullyrnoptimized schemes. In addition, for FPGA platforms, we customize a radix-4 pipelined architecture for thernTVDA in a 45-nm FPGA chip from Xilinx (XC6VLX760). Under 209.15-MHz clock rate, it achieves arnthroughput of 418.30 Mbps. Finally, we also discuss the performance evaluation and efficiency comparisonrnof different flexible architectures for real-time Viterbi decoding in terms of the decoding throughput, powerrnconsumption, optimization schemes, programming costs, and price costs.

机译：在无线通信中，维特比解码算法（VDA）是最流行的信道解码算法之一，已广泛用于WLAN，WiMAX或3G通信中。然而，维特比解码器的吞吐量受到卷积特性的限制。最近，提出了三点VDA（TVDA）来解决这个问题。在TVDA中，整个过程可以分为三个阶段，即前进，追溯和解码阶段。在本文中，我们分析了TVDA的并行性，并在多核CPU，图形处理单元（GPU）和现场可编程门阵列（FPGA）上提出了并行TVDA。我们演示了充分利用其在CPU，GPU和FPGArncomputing平台上的性能潜力的方法。对于CPU平台，我们执行两种优化方法，即单指令多数据和多线程，以使四核CPU平台上的朴素CPU版本的速度提高145倍以上。对于GPU平台，我们建议结合使用缓存内存优化和合并的全局内存访问，代码字打包方案和异步数据转换，在同一个制造年份下，在同一个制造年份下，与NVIDIA GeForce GTX580卡上的初始GPU版本相比，可实现吞吐率404.65 Mbps和12倍加速，在英特尔四核CPU i5-2300上可实现7倍加速。完全优化的方案。此外，对于FPGA平台，我们在Xilinx（XC6VLX760）的45 nm FPGA芯片中为rnTVDA自定义了radix-4流水线架构。在209.15-MHz时钟速率下，它可实现418.30 Mbps的吞吐量。最后，我们还从解码吞吐量，功耗，优化方案，编程成本和价格成本等方面讨论了实时维特比解码的不同灵活体系结构的性能评估和效率比较。

著录项

来源
《Concurrency and computation: practice and experience》 |2014年第3期|821-840|共20页
作者
Rongchun Li; Yong Dou; Dan Zou;
展开▼
作者单位

National Laboratory for Parallel and Distribution Processing, National University of Defense Technology,Changsha, China;

National Laboratory for Parallel and Distribution Processing, National University of Defense Technology,Changsha, China;

National Laboratory for Parallel and Distribution Processing, National University of Defense Technology,Changsha, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
viterbi; SDR; SSE; OpenMP; GPU; CUDA; FPGA;

机译：维特比特别提款权;上证所;OpenMP;GPU;CUDA;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. Efficient implementation of Sobel edge detection algorithm on CPU, GPU and FPGA [J] . Marwa Chouchene, Fatma Ezahra Sayadi, Yahia Said, International journal of advanced media and communication . 2014,第2a3期

机译：Sobel边缘检测算法在CPU，GPU和FPGA上的高效实现
2. FPGA Implementation of High Speed and Low Power Viterbi Decoder Using Reverse Algorithm of Convolution Encoder [J] . Ramesh K, Sudha S Journal of computational and theoretical nanoscience . 2017,第12期

机译：FPGA使用卷积编码器的反向算法实现高速和低功耗维特比解码器
3. FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis [J] . Mustafa U. Torun, Onur Yilmaz, Ali N. Akansu Journal of Parallel and Distributed Computing . 2016,第octa期

机译：用于特征分析的Jacobi算法的FPGA，GPU和CPU实现
4. Design and implementation of a parallel processing Viterbi decoder using FPGA [C] . Lei-ou Wang, Zhe-ying Li 2010 International Conference on Artificial Intelligence and Education . 2010

机译：使用FPGA的并行处理维特比解码器的设计与实现
5. Efficient Viewshed Computation Algorithms on GPUs and CPUs [D] . Qarah, Faisal F. 2020

机译：GPU和CPU上有效的viewShed计算算法
6. Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU [O] . Jinwei Wang, Xirong Ma, Yuanping Zhu, -1

机译：主动外观模型拟合算法在GPU上的高效并行实现
7. Power-Efficient Viterbi Decoder Architecture and Field Programmeble Gate Arrays Fpga Implementation [O] . Burcu Ozbay, Serap Cekli 2018

机译：高效维特比解码器架构和现场编程门阵列FPGA实现

Efficient parallel implementation of three-point viterbi decoding algorithm on CPU, GPU, and FPGA

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅