首页> 外文会议>International Conference on Selected Areas in Cryptography >PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA

【24h】

PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA

机译：Phirsa：利用Intel Xeon Phi对RSA的矢量信号

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Efficient implementations of public-key cryptographic algorithms on general-purpose computing devices, facilitate the applications of cryptography in communication security. Existing solutions work in two different directions: implementations on GPUs achieve high throughput but great latency, while those on CPUs are with low throughput and small latency. Intel Xeon Phi is the first highly parallel coprocessor of Many Integrated Core (MIC) architecture, with up to 61 cores and one 512-bit Vector Processing Unit (VPU) in each core, which offers the potential to achieve both high throughput and small latency. In this paper, we propose a vector-oriented Montgomery multiplication design based on vector carry propagation chain (VCPC) method to fully exploit the computing power of vector instructions on Intel Xeon Phi. Two key features of our design sharply reduce the number of instructions: (1) organizing the additions in Montgomery multiplication to be four VCPCs for saving the overhead of handling carry bits; (2) computing the inter-mediate scalar variable q in every round without breaking the flow of VCPCs. Furthermore, we offer the optimal Montgomery multiplication implementation of our design on Intel Xeon Phi, which make VPUs fully pipelined and maintain carry bits in vector mask registers. Based on the above, we implement RSA named PhiRSA and evaluate it on Intel Xeon Phi 7120P. For 1024, 2048 and 4096-bit RSA, PhiRSA performs 258,370, 41,803 and 5,358 decryptions per second, and the latencies are 0.94, 5.84 and 45.54ms, respectively. These results achieve 4.1 to 8.5 times performance of the existing RSA implementations on Intel Xeon Phi, exhibit high throughput comparable to those on GPUs but with much less parallel tasks, and small latency comparable to those on CPUs.

机译：通用计算设备上的公钥加密算法的高效实现，便于加密在通信安全性中的应用。现有的解决方案在两个不同的方向上工作：GPU的实现实现了高吞吐量，但延迟很大，而CPU的吞吐量较低，吞吐量低延迟。英特尔Xeon Phi是许多集成核心（MIC）架构的第一个高度平行的协处理器，每个核心最多61个核心和一个512位矢量处理单元（VPU），其提供了实现高吞吐量和小延迟的潜力。在本文中，我们提出了一种基于矢量携带传播链（VCPC）方法的向量导向的蒙哥马利乘法设计，以充分利用Intel Xeon Phi上的矢量指令的计算能力。我们设计的两个关键特性大幅减少指令数：（1）组织在蒙哥马利乘法加法是用于保存处理进位的开销4个VCPCs; （2）在每轮中计算中调解的标量变量Q，而不会破坏VCPC的流量。此外，我们提供我们在英特尔Xeon Phi上设计的最佳蒙哥格马利乘法实现，使VPU完全流水线并维持矢量掩模寄存器中的携带位。基于上述内容，我们实现了名为Phirsa的RSA，并在英特尔Xeon Phi 7120P上评估它。对于1024,2048和4096位RSA，Phirsa每秒执行258,370,41,803和5,358次解密，并且延迟分别为0.94,5.84和45.54ms。这些结果达到了Intel Xeon Phi上现有RSA实现的8.5倍，表现出与GPU上的高吞吐量，但具有更少的平行任务，以及与CPU上的延迟相当的小延迟。

著录项

来源
《International Conference on Selected Areas in Cryptography 》|2017年|580p|共19页
会议地点
作者
Yuan Zhao; Wuqiong Pan; Jingqiang Lin; Peng Liu; Cong Xue; Fangyu Zheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN918.4-53;
关键词
Intel Xeon Phi; Vectorization; Montgomery multiplication; RSA; Performance;

机译：英特尔Xeon Phi;矢量化;蒙哥马利乘法;RSA;表现;

相似文献

外文文献
中文文献
专利

1. Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi [J] . Mireya Paredes, Graham Riley, Mikel Luján Parallel and Distributed Systems, IEEE Transactions on . 2020 ,第1期

机译：在广度搜索英特尔Xeon Phi的广泛搜索方面的平行和载体
2. Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors [J] . Pawel Czarnul International journal of parallel programming . 2017 ,第5期

机译：大向量之间相似性度量的并行计算的混合英特尔至强/至强融核系统的基准性能
3. Effective SIMD Vectorization for Intel Xeon Phi Coprocessors [J] . XinminTian, HidekiSaito, Serguei V.Preis, Scientific programming . 2015 ,第4期

机译：适用于英特尔至强融核协处理器的有效SIMD矢量化
4. PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA [C] . Yuan Zhao, Wuqiong Pan, Jingqiang Lin, International conference on selected areas in cryptography . 2017

机译：PhiRSA：在用于RSA的Intel Xeon Phi上利用矢量指令的计算能力
5. An Analysis of Variation Between Cores for Intel Xeon Phi Knights Corner and Xeon Phi Knights Landing. [D] . Robinson, Jamar. 2017

机译：英特尔至强披披骑士角和至强披披骑士登陆的内核之间的差异分析。
6. Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™ [O] . Jeremias M. Gomes, George Teodoro, Alba de Melo, -1

机译：英特尔®至强融核™上的高效不规则波前传播算法
7. Exploiting Parallelism and Vectorisation in Breadth-First Search for the Intel Xeon Phi [O] . Mireya Paredes, Graham Riley, Mikel Lujan 2020

机译：在广度搜索英特尔Xeon Phi的广泛搜索方面的平行和载体

PhiRSA: Exploiting the Computing Power of Vector Instructions on Intel Xeon Phi for RSA

摘要

著录项

相似文献

相关主题

期刊订阅