首页> 外文期刊>Journal of supercomputing >Parallel implementation of Nussbaumer algorithm and number theoretic transform on a GPU platform: application to qTESLA
【24h】

Parallel implementation of Nussbaumer algorithm and number theoretic transform on a GPU platform: application to qTESLA

机译:GPU平台上Nussbaumer算法和数量理论变换的并行实现:应用于Qtesla的应用程序

获取原文
获取原文并翻译 | 示例

摘要

Among the popular post-quantum schemes, lattice-based cryptosystems have received renewed interest since there are relatively simple, highly parallelizable and provably secure under a worst-case hardness assumption. However, polynomial multiplication over rings is the most time-consuming operation in most of the lattice-based cryptosystems. To further improve the performance of lattice-based cryptosystems for large scale usage, polynomial multiplication must be implemented in parallel. The polynomial multiplication can be performed using either number theoretic transform (NTT) or Nussbaumer algorithm. However, Nussbaumer algorithm is inherently serial. Meanwhile, the efficient implementation of NTT using various indexing methods on GPU platform remains unknown. In this paper, we explore the best combination of various indexing methods to implement NTT on GPU platform and the efficient way to parallelize the Nussbaumer algorithm. Our results suggest that the combination of Gentleman-Sande and Cooley-Tukey (GS-CT) indexing methods produced the best performance on RTX2060 GPU (i.e. 422,638 polynomial multiplications per second). A technique to parallelize Nussbaumer algorithm by reducing the non-coalesced global memory access to half is produced. To the best of our knowledge, this is the first GPU implementation of Nussbaumer algorithm and it outperforms the best aforementioned NTT (GS-CT) implementation by 14.5%. For illustration purpose, the proposed GPU implementation techniques are applied to qTESLA, a state-of-the-art lattice based signature scheme. We emphasize that the proposed implementation techniques are not specific to any cryptosystem; they can be easily adapted to any other lattice-based cryptosystems.
机译:在流行的后量纲方案中,基于格子的密码系统已经收到了更新的兴趣,因为在最坏情况下,有相对简单,高度平行化和可证明的安全性。然而,多项式乘法在环中是大多数基于格子的密码系统中的最耗时的操作。为了进一步提高基于格子的密码系统的性能进行大规模使用,必须并行实现多项式乘法。可以使用任意定理变换(NTT)或NUSSBAUMER算法来执行多项式乘法。但是,Nussbaumer算法本质上是串行的。同时,使用在GPU平台上使用各种索引方法的NTT的有效实现仍然未知。在本文中,我们探讨了在GPU平台上实现NTT的各种索引方法的最佳组合,以及并行化NUSSBAUMER算法的有效方法。我们的结果表明,绅士 - 桑德和Cooley-Tukey(GS-CT)索引方法的组合在RTX2060 GPU上产生了最佳性能(即每秒422,638多项式乘法)。通过减少未结合的全局内存访问来并行化NUSSBAUMER算法的技术。据我们所知,这是Nussbaumer算法的第一个GPU实现,它优于最佳的NTT(GS-CT)实施方式14.5%。出于说明目的,所提出的GPU实现技术应用于QTESLA,基于最先进的基于格子的签名方案。我们强调,所提出的实施技术不具体对任何密码系统;它们可以很容易地适应任何其他基于格子的密码系统。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号