【24h】

QrnPro: New Processor Architecture for Accelerating Quran Applications

机译:QrnPro:用于加速古兰经应用程序的新处理器体系结构

获取原文
获取原文并翻译 | 示例

摘要

Quran applications include image/video processing, voice recognition, encrypting/decrypting data, etc., which are based on data parallelism. These applications are characterized by structured and regular computations on large data sets. In this paper, new processor architecture called QrnPro is proposed to accelerate Quran applications. QrnPro exploits data parallelism found in Quran applications by adding the vector processing technique to VLIW architecture. QrnPro uses VLIW architecture for processing multiple independent scalar instructions concurrently on parallel execution units. Moreover, data parallelism is expressed by vector instructions and processed on the same parallel execution units of the VLIW architecture. This combination between VLIW and vector processing makes efficient exploitation of resources even though the percentage of data parallelism is not 100%. Instruction memory of size 256×128-bit stores scalar/vector instructions of Quran applications in the form of 128-bit VLIW. A single register file (8-vector×16-element×32-bit or 128×32-bit registers) is used for storing both multi-scalar/vector elements. The control unit feeds the parallel execution units by the required operands (multi-scalar/vector elements) and can produce up to 4×32-bit results each clock cycle. Scalar/vector loads/stores take place from/to the data memory (512×128-bit) of QrnPro in a rate of 128-bit (4×32-bit elements) per clock cycle. Finally, the writeback stage writes up to four results (4×32-bit) per clock cycle coming from the memory system or from the execution units into the QrnPro register file. The design of our proposed QrnPro is implemented using VHDL targeting the Xilinx FPGA Virtex-5, XC5VLX110T-3FF1136 device and its performance is evaluated.
机译:古兰经的应用程序包括基于数据并行性的图像/视频处理,语音识别,加密/解密数据等。这些应用程序的特征是对大型数据集进行结构化和常规计算。在本文中,提出了一种称为QrnPro的新处理器架构,以加速Quran应用程序的发展。通过向VLIW架构中添加矢量处理技术,QrnPro利用了在Quran应用程序中发现的数据并行性。 QrnPro使用VLIW架构在并行执行单元上同时处理多个独立的标量指令。此外,数据并行性由矢量指令表示,并在VLIW体系结构的相同并行执行单元上进行处理。即使数据并行度的百分比不是100%,VLIW和矢量处理之间的这种结合也可以有效利用资源。大小为256×128位的指令存储器以128位VLIW的形式存储Quran应用程序的标量/矢量指令。单个寄存器文件(8矢量×16元素×32位或128×32位寄存器)用于存储多标量/矢量元素。控制单元通过所需的操作数(多标量/向量元素)为并行执行单元提供数据,并且每个时钟周期最多可产生4×32位结果。 QrnPro的数据存储器(512×128位)往返于其进行标量/矢量加载/存储,每个时钟周期的速率为128位(4×32位元素)。最后,回写级在每个时钟周期最多将四个结果(4×32位)从存储系统或执行单元写入QrnPro寄存器文件中。使用针对Xilinx FPGA Virtex-5,XC5VLX110T-3FF1136器件的VHDL实现了我们提出的QrnPro的设计,并对其性能进行了评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号