The performance of parallel matrix algorithms on a broadcast-based architecture

Constantine Katsinis; Diana Hecht; Ming Zhu; Harsha Narravula

首页> 外文期刊>Concurrency and Computation >The performance of parallel matrix algorithms on a broadcast-based architecture

【24h】

The performance of parallel matrix algorithms on a broadcast-based architecture

机译：基于广播的体系结构上并行矩阵算法的性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to advances in fiber-optics and very large scale integration (VLSI) technology, interconnection networks which allow multiple simultaneous broadcasts are becoming feasible. This paper summarizes one such multiprocessor architecture called the Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus). It also presents enhancements to the network interface and the cache and directory controllers which support cache block combining, capture and prefetch, and allow complete overlap of processing time with the communication time due to compulsory misses. The paper uses two fundamental matrix algorithms to characterize the impact of each enhancement on performance. Cache miss analysis and results from the execution of these programs on a SOME-Bus simulator show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate due to compulsory misses unaffected.

机译：由于光纤和超大规模集成（VLSI）技术的进步，允许多个同时广播的互连网络变得可行。本文总结了一种这样的多处理器体系结构，称为同时光学多处理器交换总线（SOME-Bus）。它还提供了对网络接口以及高速缓存和目录控制器的增强功能，这些功能支持高速缓存块合并，捕获和预取，并且由于强制丢失而允许处理时间与通信时间完全重叠。本文使用两种基本矩阵算法来表征每种增强功能对性能的影响。高速缓存未命中分析和在SOME-Bus模拟器上执行这些程序的结果表明，随着高速缓存大小的增加，块捕获和预取与有效的块替换策略相结合，成功地成功降低了由于强制性未命中而导致的未命中率。传统体系结构中高速缓存大小的增加使强制丢失的丢失率不受影响。

著录项

来源
《Concurrency and Computation》 |2006年第3期|p.271-303|共33页
作者
Constantine Katsinis; Diana Hecht; Ming Zhu; Harsha Narravula;
展开▼
作者单位

Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA 19104, U.S.A.;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
multiprocessors; broadcast architectures; numerical algorithms;

机译：多处理器;广播架构;数值算法;

相似文献

外文文献
中文文献
专利

1. A new congestion control algorithm for improving the performance of a broadcast-based multiprocessor architecture [J] . Cigdem Inan Aci, rnMehmet Fatih Akay Journal of Parallel and Distributed Computing . 2010,第9期

机译：一种新的拥塞控制算法，用于提高基于广播的多处理器体系结构的性能
2. A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures: RT parallel algorithm [J] . Takashima H., Yamada S., Obara S., Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 2002,第14期

机译：用于具有小的局部分布式存储体系结构的大规模Fock矩阵构建的新颖并行算法：RT并行算法
3. A novel parallel algorithm for large-scale fock matrix construction with smalllocally distributed memory architectures:RT parallel algorithm [J] . Hajime Takashima, Kunihiro Kitamura, So Yamada, Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 2002,第14a15期

机译：一种用于具有小局部分布存储架构的大规模Fock矩阵构造的新颖并行算法：RT并行算法
4. Parallel matrix algorithms on a broadcast-based architecture [C] . Katsinis, C., Hecht, . 2004

机译：基于广播的架构上的并行矩阵算法
5. Performance of parallel algorithms on a broadcast-based architecture. [D] . Narravula, Harsha V. 2003

机译：基于广播的体系结构上并行算法的性能。
6. Enhancing the usability and performance of structured association mapping algorithms using automation parallelization and visualization in the GenAMap software system [O] . Ross E Curtis, Anuj Goyal, Eric P Xing 2012

机译：使用GenAMap软件系统中的自动化并行化和可视化功能来增强结构化关联映射算法的可用性和性能
7. Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms [O] . Kouya, Tomonori 2015

机译：多精度矩阵乘法的性能评估使用并行化strassen和Winograd算法

The performance of parallel matrix algorithms on a broadcast-based architecture

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅