首页> 外文会议> >Performance Evaluation of Instruction Set Extensions for Long Integer Modular Arithmetic on a SPARC V8 Processor
【24h】

Performance Evaluation of Instruction Set Extensions for Long Integer Modular Arithmetic on a SPARC V8 Processor

机译:SPARC V8处理器上长整数模块化算法的指令集扩展的性能评估

获取原文

摘要

Many important algorithms for public-key cryptography rely on computation-intensive arithmetic operations like modular exponentiation on very long integers, typically in the range of 512 and 2048 bits. Modular exponentiation is generally realized through a sequence of modular multiplications and spends the majority of execution time in simple inner loops. Speeding up these performance-critical inner loop operations with custom instructions has, therefore, a significant impact on the total execution time of public-key cryptosystems. In this paper we analyze the performance of instruction set extensions for long integer arithmetic on a SPARC V8 processor. We discuss various implementation options and optimization opportunities for both modular multiplication and exponentiation. In particular, we introduce a partial loop unrolling (PLU) technique for modular multiplication which allows to achieve large performance gains at the cost of a moderate increase in code size, while maintaining the full flexibility of a "rolled-loop" implementation. In addition, we study window methods for modular exponentiation and analyze their impact on performance and memory requirements. Our experimental results, obtained with an FPGA prototype of the LEON-2 SPARC V8 core, show that a full 1024-bit modular exponentiation can be performed in about 12.5 路 106 clock cycles, which is a reasonable value for embedded devices like smart cards or sensor nodes.
机译:公钥密码学的许多重要算法都依赖于计算量大的算术运算,例如对非常长的整数(通常在512位和2048位范围内)进行模幂运算。模幂通常通过一系列模乘法来实现,并在简单的内部循环中花费大部分执行时间。因此,使用自定义指令加快这些对性能至关重要的内部循环操作的速度,会对公共密钥密码系统的总执行时间产生重大影响。在本文中,我们分析了SPARC V8处理器上长整数算术指令集扩展的性能。我们讨论了模块化乘法和幂运算的各种实现选项和优化机会。特别是,我们引入了用于模块乘法的部分循环展开(PLU)技术,该技术允许以适度增加代码大小为代价实现较大的性能提升,同时保持“滚动循环”实现的全部灵活性。此外,我们研究了用于模幂的窗口方法,并分析了它们对性能和内存要求的影响。我们从LEON-2 SPARC V8内核的FPGA原型获得的实验结果表明,可以在大约12.5×106个时钟周期内执行完整的1024位模块化幂运算,这对于诸如智能卡或智能卡之类的嵌入式设备来说是一个合理的值传感器节点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号