...
首页> 外文期刊>Parallel and Distributed Computing Practices >EFFICIENT MULTITHREADED ALGORITHMS FOR THE FAST FOURIER TRANSFORM
【24h】

EFFICIENT MULTITHREADED ALGORITHMS FOR THE FAST FOURIER TRANSFORM

机译:快速傅里叶变换的高效多线程算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the dataflow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFT iterations as a parent-child relationship while fully exploiting the underlying parallelism. By establishing small sized threads, minimal cost overhead in thread switching and embedding the synchronization operations within a parent-child relationship, the receiver-initiated algorithm is well suited for fine-grained architectures. The second approach, referred to as the sender-initiated algorithm, follows a dataflow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. We show in the paper the advantages and disadvantages of these approaches on fine-grained architectures. The implementations of the proposed algorithms have been carried out on the EARTH (Efficient Architecture for Running THreads) platform which is a non-preemptive, fine-grained dataflow model. For both the algorithms, we analyze the number of remote and local threads in each processor and study its impact on the experimental results. Our implementation results show that for large number of processors, both the algorithms perform well, yielding execution times of only 10 ms for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed, For certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach, We also present the algorithms implication on the design of fine-grained architectures.
机译:在本文中,我们提出了用于快速傅立叶变换(FFT)问题的细粒度多线程算法和实现。 FFT问题是根据数据流概念使用两种不同的方法制定的。第一种方法称为接收器启动的算法,它在充分利用底层并行性的同时,将FFT迭代实现为父子关系。通过建立小尺寸的线程,最小的线程切换成本开销以及将同步操作嵌入父子关系中,接收器启动的算法非常适合于细粒度的体系结构。第二种方法称为发送方启动的算法,它遵循基于生产者-消费者编程风格的数据流模型,并且可以采用不同的体系结构参数来实现高性能。我们在论文中展示了这些方法在细粒度架构上的优缺点。所提出算法的实现已在EARTH(运行THread的有效体系结构)平台上进行,该平台是一种非抢先的细粒度数据流模型。对于这两种算法,我们分析每个处理器中远程和本地线程的数量,并研究其对实验结果的影响。我们的实现结果表明,对于大量处理器,这两种算法都表现良好,假设在64位处理器上以每个处理器140 MHz的时钟速度运行,则输入16​​ K数据点时执行时间仅为10 ms。在固定问题大小和机器大小的块大小上,接收者发起的方法比发送者发起的方法要好。我们还介绍了算法对细粒度体系结构的设计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号