首页> 外文期刊>Concurrency and Computation >C2CU: a CUDA C program generator for bulk execution of a sequentialalgorithm
【24h】

C2CU: a CUDA C program generator for bulk execution of a sequentialalgorithm

机译:C2CU:用于批量执行顺序的CUDA C程序生成器算法

获取原文
获取原文并翻译 | 示例

摘要

Several important tasks, including matrix computation, signal processing, sorting, dynamic programming,encryption, and decryption, can be performed by oblivious sequential algorithms. A sequential algorithm isoblivious if an address accessed at each time does not depend on the input data. A bulk execution of asequential algorithm is to execute it for many independent inputs in turn or in parallel. A number of workshave been devoted to design and implement parallel algorithms for a single input. However, none of theseworks evaluated the bulk execution performance of these algorithms. The first contribution of this paper isto present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C programfor a bulk execution of an oblivious sequential algorithm. The C2CU has been used to generate CUDA Cprograms for the bulk execution of the bitonic sorting, Floyd-Warshall, and Montgomery modulomultiplication algorithms. Compared to a sequential implementation on a single CPU, the generated CUDAC programs for the above algorithms run, respectively, 199, 54, and 78 times faster.
机译:几个重要任务,包括矩阵计算,信号处理,排序,动态编程,加密和解密可以通过令人沮丧的顺序算法来执行。序列算法是忘记如果每次访问的地址不依赖于输入数据。批量执行顺序算法是为依次或并行地执行许多独立输入。一些作品已经致力于为单个输入设计和实施并行算法。但是,这些都不是作品评估了这些算法的批量执行性能。本文的第一个贡献是为批量执行不知情的顺序算法提供时间最佳实现。我们的第二款贡献是开发一个名为C2CU的工具,它会自动生成CUDA C程序用于批量执行不希望的顺序算法。 C2CU已被用于生成CUDA C.批量执行Bitonic Sorting,Floyd-Warshall和Montgomery Modulo的程序乘法算法。与单个CPU上的连续实现相比,生成的CUDAC程序分别为上述算法,分别为199,54和78倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号