Parallel SIMD CPU and GPU Implementations of Berlekamp-Massey Algorithm and Its Error Correction Application

Mohebbi Hamidreza

首页> 外文期刊>International journal of parallel programming >Parallel SIMD CPU and GPU Implementations of Berlekamp-Massey Algorithm and Its Error Correction Application

【24h】

Parallel SIMD CPU and GPU Implementations of Berlekamp-Massey Algorithm and Its Error Correction Application

机译：Berlekamp-Massey算法的并行SIMD CPU和GPU实现及其纠错应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Berlekamp-Massey algorithm finds the shortest linear feedback shift register for a binary input sequence. A wide range of applications like cryptography and digital signal processing use this algorithm. This research proposes novel parallel mechanisms offered by heterogeneous CPU and GPU hardwares in order to achieve the best possible performance for BMA. The proposed bitwise implementation of the BMA algorithm is almost 35 times faster than state of the art implementations. This further improvement is achieved by using SIMD instructions which provides data level parallelism. This new approach can be 4.6 and 35 times faster than a bitwise CPU and state of the art implementations, respectively. In order to achieve the highest possible speedup over a multi-core structure, a multi-threading implementation is introduced in this research. By leveraging on OpenMP we were able to obtain a speedup of 10 times over 12 cores server. The GPU device with thousands of processing cores can bring great speedup over the best CPU implementation. Two other parallel mechanisms offered by GPU are concurrent kernel execution and streaming. They achieve 14.5 and 2.2 times of speedup compared to CPU serial and typical CUDA implementations, respectively. Also, the performance of the openMP code with SIMD instructions is compared with GPU stream implementation. The effectiveness of the proposed method is evaluated in a real world error correction application and it achieves 6.8 times of speedup.

机译：Berlekamp-Massey算法为二进制输入序列找到最短的线性反馈移位寄存器。该算法可用于诸如密码学和数字信号处理之类的广泛应用。这项研究提出了异构CPU和GPU硬件提供的新颖并行机制，以实现BMA的最佳性能。提出的BMA算法的按位实现比最先进的实现快35倍。通过使用提供数据级别并行性的SIMD指令可以实现进一步的改进。这种新方法分别比按位CPU和最新实现快4.6和35倍。为了在多核结构上实现最高的加速，本研究引入了多线程实现。利用OpenMP，我们可以在12核服务器上获得10倍的加速。具有数千个处理核心的GPU设备可以大大提高最佳CPU实现的速度。 GPU提供的另外两个并行机制是并发内核执行和流传输。与CPU串行和典型CUDA实现相比，它们分别实现了14.5和2.2倍的加速。此外，将带有SIMD指令的openMP代码的性能与GPU流实现进行了比较。在现实世界中的纠错应用中评估了所提方法的有效性，并实现了6.8倍的加速。

著录项

来源
《International journal of parallel programming》 |2019年第1期|137-160|共24页
作者
Mohebbi Hamidreza;
展开▼
作者单位

Univ Massachusetts, Comp Sci Dept, Boston, MA 02125 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Coding theory; Linear complexity; Berlekamp-Massey algorithm; Parallel computing; SIMD; GPU;

机译：编码理论线性复杂度Berlekamp-Massey算法并行计算SIMD GPU;

相似文献

外文文献
中文文献
专利

1. Parallel SIMD CPU and GPU Implementations of Berlekamp-Massey Algorithm and Its Error Correction Application [J] . Mohebbi Hamidreza International journal of parallel programming . 2019,第1期

机译：Berlekamp-Massey算法的并行SIMD CPU和GPU实现及其纠错应用
2. CPU and GPU behaviour modelling versus sequential and parallel bias field correction fuzzy C-means algorithm implementations [J] . Bouchaib Cherradi, Noureddine Ait Ali, Ahmed El Abbassi, Contemporary Engineering Sciences . 2017,第9a12期

机译：CPU和GPU行为建模与顺序和并行偏置场校正模糊C均值算法实现
3. A cost-optimal parallel algorithm for the 0-1 knapsack problem and its performance on multicore CPU and GPU implementations [J] . Kenli Li, Jing Liu, Lanjun Wan, Parallel Computing . 2015,第mara期

机译：一种成本最优的0-1背包问题并行算法及其在多核CPU和GPU实现上的性能
4. Mixed serial/parallel hardware implementation of the Berlekamp-Massey algorithm for BCH decoding in Flash controller applications [C] . Freudenberger Jurgen, Spinner Jens 2012 International Symposium on Signals, Systems, and Electronics. . 2012

机译：Flash控制器应用中用于BCH解码的Berlekamp-Massey算法的混合串行/并行硬件实现
5. Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms. [D] . Wu, Jing. 2014

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。
6. SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs [O] . Imran S. Haque, Vijay S. Pande, W. Patrick Walters -1

机译：sImL：一种快速sImD算法计算在GpU和CpU LINGO化学相似性
7. Multi-GPU Implementations of Parallel 3D Sweeping Algorithms with Application to Geological Folding [O] . Krishnasamy, Ezhilmathi, Sourouri, Mohammed, Cai, Xing 2015

机译：并行3D扫描算法的多GPU实现及其在地质折叠中的应用

Parallel SIMD CPU and GPU Implementations of Berlekamp-Massey Algorithm and Its Error Correction Application

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅