Performance of parallel algorithms on a broadcast-based architecture.

机译：基于广播的体系结构上并行算法的性能。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Research in high-end computing has produced enormous benefits to society. While new data- and computation-intensive applications are appearing all the time, there is evidence that present scalable parallel architectures may not be well suited for these applications. To achieve petaflops computing, advances in hardware technology, architecture, system software, and programming environments is needed.; Due to advances in fiber optics and VLSI technology, interconnection networks, which allow multiple simultaneous broadcasts, are becoming feasible. The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency high-bandwidth, fiber-optic network with a unique feature that every processor is directly connected to the other processor through a dedicated broadcast/output channel. This thesis presents the multiprocessor architecture of the SOME-Bus and examines the performance of representative algorithms for matrix operations and sorting using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can greatly improve the performance; for example, the communication time of a matrix-vector multiplication algorithm is reduced to O(1) using DSM.; Existing parallel loop schemes are extended to make them suitable for the high-end system under study. Efficient mapping of existing parallel software to the system is studied. Software is implemented, tested and evaluated for performance on a simulator developed for the system. The thesis also presents enhancements to the network interface and the cache and directory controllers, which allow significant overlap of processing time with the communication time due to compulsory misses. Results from the simulated execution of simple algorithms such as the matrix-matrix multiplication on the SOME-Bus show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate (due to compulsory misses) unaffected.

机译：高端计算的研究已为社会带来了巨大的好处。尽管新的数据和计算密集型应用程序一直在出现，但有证据表明，当前的可伸缩并行体系结构可能不太适合这些应用程序。为了实现千万亿次浮点运算，需要在硬件技术，体系结构，系统软件和编程环境方面取得进步。由于光纤和VLSI技术的进步，允许多个同时广播的互连网络变得可行。同步光学多处理器交换总线（SOME-Bus）是一种低延迟高带宽光纤网络，其独特之处在于每个处理器都通过专用的广播/输出通道直接连接到另一个处理器。本文提出了SOME-Bus的多处理器体系结构，并使用消息传递和分布式共享内存范例研究了用于矩阵运算和排序的代表性算法的性能。它表明，对网络接口以及缓存和目录控制器的简单增强可以极大地提高性能。例如，使用DSM将矩阵向量乘法算法的通信时间缩短为（1）。现有的并行循环方案已得到扩展，以使其适用于正在研究的高端系统。研究了现有并行软件到系统的有效映射。在针对系统开发的模拟器上实施，测试和评估软件的性能。本文还提出了对网络接口以及缓存和目录控制器的增强，由于强制丢失，使得处理时间与通信时间显着重叠。在SOME-Bus上矩阵矩阵乘法之类的简单算法的仿真执行结果表明，随着高速缓存大小的增加，强制捕获会导致块捕获和预取与有效的块替换策略相结合，从而成功地显着降低了未命中率，传统架构中缓存大小的类似增加使得未命中率（由于强制性未命中）不受影响。

著录项

作者
Narravula, Harsha V.;
展开▼
作者单位

Drexel University.;

展开▼
授予单位 Drexel University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2003
页码 92 p.
总页数 92
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. The performance of parallel matrix algorithms on a broadcast-based architecture [J] . Constantine Katsinis, Diana Hecht, Ming Zhu, Concurrency and Computation . 2006,第3期

机译：基于广播的体系结构上并行矩阵算法的性能
2. Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture. [J] . Sharma R, Gupta N, Narang V, International journal of bioinformatics research and applications . 2011,第2期

机译：在GPU架构上使用PWM并行执行DNA序列匹配算法。
3. Performance improvement of parallel programs on a broadcast-based distributed shared memory multiprocessor by simulation [J] . Akay MF, Katsinis C Simulation modelling practice and theory: International journal of the Federation of European Simulation Societies . 2008,第3期

机译：通过仿真提高基于广播的分布式共享内存多处理器上并行程序的性能
4. Parallel matrix algorithms on a broadcast-based architecture [C] . Katsinis, C., Hecht, . 2004

机译：基于广播的架构上的并行矩阵算法
5. Fault-tolerant distributed shared memory on a broadcast-based interconnection architecture. [D] . Hecht, Diana Lynn. 2002

机译：基于广播的互连体系结构上的容错分布式共享内存。
6. Enhancing the usability and performance of structured association mapping algorithms using automation parallelization and visualization in the GenAMap software system [O] . Ross E Curtis, Anuj Goyal, Eric P Xing 2012

机译：使用GenAMap软件系统中的自动化并行化和可视化功能来增强结构化关联映射算法的可用性和性能
7. Automatic Parallelizing Compiler For Distributed Memory Parallel Computers: New Algorithms To Improve The Performance Of The Inspector/executor [O] . Atsushi Kubota, Ikuo Miyoshi, Hiroshi Nakashima, 100

机译：分布式内存并行计算机的自动并行编译器：提高检查器/执行器性能的新算法

Performance of parallel algorithms on a broadcast-based architecture.

摘要

著录项

相似文献

相关主题

期刊订阅