Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions

Douglas Aberdeen; Jonathan Baxter

首页> 外文期刊>Concurrency and Computation >Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions

【24h】

Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions

机译：Emmerald：使用英特尔SSE指令的快速矩阵矩阵乘法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Gereralized matrix-matrix multiplication forms the kernel of many mathematical algorithms, hence a faster matrix-matrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the Intel Pentium single instruction multiple data (SIMD) floating point architecture. The main difficulty with the Pentium and other commodity processors is the need to efficiently utilize the cache hierarchy, particularly given the growing gap between main-memory and CPU clock speeds.

机译：广义矩阵矩阵乘法构成许多数学算法的核心，因此更快的矩阵矩阵乘法立即使这些算法受益。在本文中，我们使用Intel Pentium单指令多数据（SIMD）浮点架构为大型矩阵实现了有效的矩阵乘法。奔腾和其他商用处理器的主要困难是需要有效利用高速缓存层次结构，尤其是考虑到主内存和CPU时钟速度之间的差距越来越大时。

著录项

来源
《Concurrency and Computation》 |2001年第2期|p.103-119|共17页
作者
Douglas Aberdeen; Jonathan Baxter;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
GEMM; SIMD; SSE; matrix multiply;

机译：GEMM;SIMD;上证所;矩阵乘法;

相似文献

外文文献
中文文献
专利

1. Fast inversion algorithm in GF(2m) suitable for implementation with a polynomial multiply instruction on GF(2) [J] . Kobayashi K., Takagi N., Takagi K. Computers & Digital Techniques, IET . 2012,第3期

机译：GF（2 m ）中的快速反演算法适合在GF（2）上使用多项式乘法指令来实现
2. A Comparison of Embedded Total Task Instruction in Teaching Behavioral Chains to Massed One-on-One Instruction for Students With Intellectual Disabilities: Accessing General Education Settings and Core Academic Content [J] . JamesonJ.M., WalkerR., UtleyK., Behavior modification . 2012,第3期

机译：行为链中嵌入的总任务教学与智障学生的一对一大规模教学的比较：访问通识教育设置和核心学术内容
3. Compressed sensing MRI via fast linearized preconditioned alternating direction method of multipliers [J] . Shanshan Chen, Hongwei Du, Linna Wu, BioMedical Engineering OnLine . 2017,第1期

机译：通过乘数的快速线性化预处理交替方向方法进行压缩感测MRI
4. FAST IMPLEMENTATION OF RC6 USING INTEL'S SSE2 INSTRUCTIONS [C] . HU Qing, XU Xiao-dong 第八届分布式计算及其应用国际学术研讨会(The 8th International Symposium on Distributed Computing and Applications to Business,Engineering and Science) . 2009

机译：使用英特尔SSE2指令快速实现RC6
5. Integration d'instructions data-paralleles dans le langage psC et compilation pour processeur SIMD (Intel SSE). [D] . Langlais, Michel. 2013

机译：psC语言中数据并行指令的集成以及SIMD处理器（Intel SSE）的编译。
6. Compressed sensing MRI via fast linearized preconditioned alternating direction method of multipliers [O] . Shanshan Chen, Hongwei Du, Linna Wu, 2017

机译：通过乘数的快速线性化预处理交替方向方法进行压缩感测MRI
7. Intégration d'instructions data-parallèles dans le langage PSC et compilation pour processeur SIMD (INTEL SSE) [O] . Langlais Michel 2013

机译：以PSC语言集成数据并行指令，并为SIMD处理器（INTEL SSE）进行编译
8. Incoherent optical matrix-matrix multiplier [R] . Dias, A. R. 1981

机译：非相干光学矩阵 - 矩阵乘法器

Emmerald: a fast matrix-matrix multiply using Intel's SSE instructions

摘要

著录项

相似文献

相关主题

期刊订阅