Codevelopment of Multi-Level Instruction Set Architecture and Hardware for an Efficient Matrix Processor

Mostafa I. Soliman; Abdulmajid F. Al-Junaid

首页> 外文期刊>Neural, Parallel & Scientific Computations >Codevelopment of Multi-Level Instruction Set Architecture and Hardware for an Efficient Matrix Processor

【24h】

Codevelopment of Multi-Level Instruction Set Architecture and Hardware for an Efficient Matrix Processor

机译：高效矩阵处理器的多级指令集架构和硬件的共同开发

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The instruction set architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. Multi-level ISA is proposed to explicitly communicate data parallelism to hardware (processor) in a compact way instead of the dynamic extraction using complex hardware or the static extraction using sophisticated compiler techniques. This paper presents the codevelopment of multi-level ISA and hardware for an efficient matrix processor called Mat-Core. Mat-Core extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data. To hide memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalar-vector and vector-vector instructions. Mat-Core leads to a compiler model that is efficient both in terms of performance and executable code size. On four parallel lanes Mat-Core and matrix registers of size 8×4 or 32 elements, our results show performances of about 1.6, 2.1, 4.1, and 6.4 FLOPs per clock cycle achieved on scalar-vector multiplication, SAXPY, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

机译：指令集体系结构（ISA）是处理器的一部分，对于程序员或编译器编写者而言是可见的。提出了多级ISA，以紧凑的方式将数据并行性显式地传达给硬件（处理器），而不是使用复杂的硬件进行动态提取或使用复杂的编译器技术进行静态提取。本文介绍了一种称为Mat-Core的高效矩阵处理器的多层ISA和硬件的共同开发。 Mat-Core扩展了具有矩阵单元的通用标量处理器，用于处理矢量/矩阵数据。为了隐藏内存延迟，扩展矩阵单元被分解为两个部分：地址生成和数据计算，它们通过数据队列进行通信。像矢量架构一样，数据计算单元组织在并行通道中。但是，在并行通道上，Mat-Core除了标量矢量和矢量矢量指令外，还可以执行标量矩阵，矢量矩阵和矩阵矩阵指令。 Mat-Core导致了一种编译器模型，该模型在性能和可执行代码大小方面均十分有效。在大小为8×4或32个元素的四个并行通道Mat-Core和矩阵寄存器上，我们的结果显示，在标量矢量乘法，SASPY和矢量矩阵乘法下，每个时钟周期可获得约1.6、2.1、4.1和6.4 FLOP的性能。，以及矩阵矩阵乘法。

著录项

来源
《Neural, Parallel & Scientific Computations》 |2010年第1期|P.59-74|共16页
作者
Mostafa I. Soliman; Abdulmajid F. Al-Junaid;
展开▼
作者单位

Computer & System Section, Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt;

rnComputer & System Section, Electrical Engineering Department, Faculty of Engineering, Assiut University, Egypt;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
high performance computing; multi-level ISA; performance evaluation; systemc implementation; vector/matrix processing;

机译：高性能计算;多层ISA;绩效评估;系统实现;向量/矩阵处理;

相似文献

外文文献
中文文献
专利

1. Resource Sharing of Pipelined Custom Hardware Extension for Energy-Efficient Application-Specific Instruction Set Processor Design [J] . HAI LIN, YUNSI FEI ACM Transactions on Design Automation of Electronic Systems . 2012,第4期

机译：流水线定制硬件扩展的资源共享，以实现高能效的专用指令集处理器设计
2. Instruction scheduling and transformation for a VLIW unified reduced instruction set computer/digital signal processor processor with shared register architecture [J] . Cheng-Yu Lee, Min-Chin Hung, Rong-Guey Chang Concurrency and computation: practice and experience . 2014,第1期

机译：具有共享寄存器架构的VLIW统一精简指令集计算机/数字信号处理器处理器的指令调度和转换
3. Hardware Implementation of Instruction Level Parallel Architecture Incorporating Special Functional Units for Image Processing Algorithms [J] . M. Kannan, S.K. Srivatsa Information Technology Journal . 2006,第3期

机译：包含特殊功能单元的图像处理算法的指令级并行体系结构的硬件实现
4. Codevelopment of Multi-Level ISA and Hardware for an Efficient Matrix Processor [C] . Mostafa I. Soliman, Abdulmajid F. Al-Junaid International Conference on Computer Engineering Systems . 2009

机译：用于高效矩阵处理器的多级ISA和硬件的代码开发
5. Hardware and software architectures for energy- and resource-efficient signal processing systems. [D] . Cho, Inkeun. 2014

机译：节能和资源高效的信号处理系统的硬件和软件架构。
6. A Preferential Design Approach for Energy-Efficient and Robust Implantable Neural Signal Processing Hardware [O] . Seetharam Narasimhan, Hillel J. Chiel, Swarup Bhunia -1

机译：节能耐用的可植入神经信号处理硬件的优先设计方法
7. A Hybrid Hardware/Software Architecture That Combines a 4-wide Very Long Instruction Word Software Processor (VLIW) with Application-specific Super-complex Instruction Set Hardware Functions [O] . Kusic Dara Marie 2005

机译：混合硬件/软件体系结构，将4宽超长指令字软件处理器（VLIW）与特定于应用程序的超复杂指令集硬件功能相结合

Codevelopment of Multi-Level Instruction Set Architecture and Hardware for an Efficient Matrix Processor

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅