Unified Designs for High Performance LDPC Decoding on GPGPU

Bo-Cheng Charles Lai; Chia-Ying Lee; Tsou-Han Chiu; Hsien-Kai Kuo; Chun-Kai Chang

首页> 外文期刊>IEEE Transactions on Computers >Unified Designs for High Performance LDPC Decoding on GPGPU

【24h】

Unified Designs for High Performance LDPC Decoding on GPGPU

机译：GPGPU上高性能LDPC解码的统一设计

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern GPGPU's have enabled massively parallel computing with programmability that can exploit the highly parallel nature of LDPC decoding. Previous works customized the design on a GPGPU towards specific execution attributes of a particular LDPC decoding matrix. Supporting different LDPC decoding matrices requires either substantial rework on the current program, or a brand new parallel design. This paper proposes two unified designs that can achieve high performance for both regular and irregular LDPC decoding on a GPGPU. The first design introduces a node-based scheme with a versatile translation array mechanism that can efficiently handle the complex data access patterns of different LDPC decoding matrices. The second design proposes an edge-based parallel paradigm that uses more intuitive data layout. More edges than nodes in a Tanner graph also give the edge-based design higher computation parallelism when there are limited concurrent codewords. With the proposed unified designs, designers can be ignorant of the types of LDPC matrices and achieve high performance LDPC decoding. The experiments on a GTX 470 GPGPU have demonstrated up to 134.56x runtime improvement, when compared with designs on a high-end CPU. The maximum throughput can reach 80.25 Mbps. When compared with the previous customized designs, the proposed systematic designs can reach better performance while relieving the effort of customization.

机译：现代GPGPU通过可编程性实现了大规模并行计算，可以利用LDPC解码的高度并行性。先前的工作针对特定LDPC解码矩阵的特定执行属性在GPGPU上定制了设计。要支持不同的LDPC解码矩阵，需要对当前程序进行大量修改，或者需要全新的并行设计。本文提出了两个统一的设计，可以在GPGPU上实现常规和不规则LDPC解码的高性能。第一种设计引入了具有通用转换阵列机制的基于节点的方案，该机制可以有效处理不同LDPC解码矩阵的复杂数据访问模式。第二种设计提出了一种基于边缘的并行范例，该范例使用更直观的数据布局。当有限的并发码字时，比Tanner图中的节点多的边缘也使基于边缘的设计具有更高的计算并行度。通过提出的统一设计，设计人员可以不了解LDPC矩阵的类型，而可以实现高性能LDPC解码。与高端CPU上的设计相比，在GTX 470 GPGPU上进行的实验表明运行时间最多可提高134.56倍。最大吞吐量可以达到80.25 Mbps。与以前的定制设计相比，建议的系统设计可以达到更好的性能，同时减轻了定制工作的负担。

著录项

来源
《IEEE Transactions on Computers》 |2016年第12期|3754-3765|共12页
作者
Bo-Cheng Charles Lai; Chia-Ying Lee; Tsou-Han Chiu; Hsien-Kai Kuo; Chun-Kai Chang;
展开▼
作者单位

National Chiao Tung University, Hsinchu, Taiwan;

National Chiao Tung University, Hsinchu, Taiwan;

MediaTek Corp., Hsinchu, Taiwan;

MediaTek Corp., Hsinchu, Taiwan;

National Chiao Tung University, Hsinchu, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Decoding; Parallel processing; Iterative decoding; Message passing; Computer architecture; Message systems;

机译：解码;并行处理;迭代解码;消息传递;计算机体系结构;消息系统;
入库时间 2022-08-17 13:36:06

相似文献

外文文献
中文文献
专利

1. High efficient distributed video coding with parallelized design for LDPCA decoding on CUDA based GPGPU [J] . Yu-Shan Pai, Yun-Chung Shen, Ja-Ling Wu Journal of visual communication & image representation . 2012,第1期

机译：具有并行设计的高效分布式视频编码，可在基于CUDA的GPGPU上进行LDPCA解码
2. Design of High-Performance and Area-Efficient Decoder for 5G LDPC Codes [J] . Hangxuan Cui, Fakhreddine Ghaffari, Khoa Le, IEEE transactions on circuits and systems . I , Regular papers . 2021,第2期

机译：用于5G LDPC代码的高性能和面积高效解码器的设计
3. A 5.83 pJ/bit/iteration High-Parallel Performance-Aware LDPC Decoder IP Core Design for WiMAX in 65 nm CMOS [J] . Xiongxin ZHAO, Zhixiang CHEN, Xiao PENG, IEICE Transactions on fundamentals of electronics, communications & computer sciences . 2013,第12期

机译：适用于65 nm CMOS的WiMAX的5.83 pJ /位/迭代高并行性能感知LDPC解码器IP内核设计
4. A highly parallel design for irregular LDPC decoding on GPGPUs [C] . Chiu Tsou-Han, Kuo Hsien-Kai, Lai Bo-Cheng Charles 2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. . 2012

机译：用于GPGPU上不规则LDPC解码的高度并行设计
5. Design of LDPC decoders for improved low error rate performance. [D] . Zhang, Zhengya. 2009

机译：LDPC解码器的设计可提高低错误率性能。
6. A Flexible Hybrid BCH Decoder for Modern NAND Flash Memories Using General Purpose Graphical Processing Units (GPGPUs) [O] . Arul Subbiah, Tokunbo Ogunfunmi 2019

机译：使用通用图形处理单元（GPGPU）的现代NAND闪存的灵活混合BCH解码器
7. Design of a Unified Transport Triggered Processor for LDPC/Turbo Decoder [O] . Shahabuddin, Shahriar, Janhunen, Janne, Bayramoglu, Muhammet Fatih, 2015

机译：LDpC / Turbo译码器统一传输触发处理器的设计

Unified Designs for High Performance LDPC Decoding on GPGPU

摘要

著录项

相似文献

相关主题

期刊订阅