首页> 外文期刊>Parallel Computing >Intel Cilk Plus for complex parallel algorithms: 'Enormous Fast Fourier Transforms' (EFFT) library
【24h】

Intel Cilk Plus for complex parallel algorithms: 'Enormous Fast Fourier Transforms' (EFFT) library

机译:面向复杂并行算法的Intel Cilk Plus:“巨大的快速傅立叶变换”(EFFT)库

获取原文
获取原文并翻译 | 示例

摘要

In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley-Tukey method have to control cache utilization, memory bandwidth and vector hardware usage, and at the same time scale across multiple threads or compute nodes. Our method builds on a single-threaded Intel Math Kernel Library (MKL) implementation of real-to-complex DFFT, and uses the Intel Cilk Plus framework for thread parallelism. We demonstrate the ability of Intel Cilk Plus to handle parallel recursion with nested loop-centric parallelism without tuning the code to the number of cores or cache metrics. The result of our work is a library called EFFT that performs 1D DFTs of size 2(N) for N >= 21 faster than the corresponding Intel MKL parallel DFT implementation by up to 1.5 x , and faster than FFTW by up to 2.5x. The code of EFFT is available for free download under the GPLv3 license.
机译:在本文中,我们演示了在多核Intel Xeon处理器上并行处理大型一维离散快速傅立叶变换(DFFT)的计算方法。基于递归Cooley-Tukey方法的DFFT必须控制缓存利用率,内存带宽和矢量硬件使用率,并同时跨多个线程或计算节点进行扩展。我们的方法基于从实到复杂DFFT的单线程英特尔数学内核库(MKL)实现,并使用英特尔Cilk Plus框架进行线程并行处理。我们展示了Intel Cilk Plus处理嵌套递归为中心的并行处理并行递归的能力,而无需将代码调整为内核或缓存指标的数量。我们工作的结果是一个名为EFFT的库,对于N> = 21,执行大小为2(N)的一维DFT比相应的Intel MKL并行DFT实现快1.5倍,比FFTW快2.5倍。 EFFT代码可根据GPLv3许可免费下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号