首页> 外文学位 >Performance of parallel algorithms on a broadcast-based architecture.
【24h】

Performance of parallel algorithms on a broadcast-based architecture.

机译:基于广播的体系结构上并行算法的性能。

获取原文
获取原文并翻译 | 示例

摘要

Research in high-end computing has produced enormous benefits to society. While new data- and computation-intensive applications are appearing all the time, there is evidence that present scalable parallel architectures may not be well suited for these applications. To achieve petaflops computing, advances in hardware technology, architecture, system software, and programming environments is needed.; Due to advances in fiber optics and VLSI technology, interconnection networks, which allow multiple simultaneous broadcasts, are becoming feasible. The Simultaneous Optical Multiprocessor Exchange Bus (SOME-Bus) is a low-latency high-bandwidth, fiber-optic network with a unique feature that every processor is directly connected to the other processor through a dedicated broadcast/output channel. This thesis presents the multiprocessor architecture of the SOME-Bus and examines the performance of representative algorithms for matrix operations and sorting using the message-passing and distributed-shared-memory paradigms. It shows that simple enhancements to the network interface and the cache and directory controllers can greatly improve the performance; for example, the communication time of a matrix-vector multiplication algorithm is reduced to O(1) using DSM.; Existing parallel loop schemes are extended to make them suitable for the high-end system under study. Efficient mapping of existing parallel software to the system is studied. Software is implemented, tested and evaluated for performance on a simulator developed for the system. The thesis also presents enhancements to the network interface and the cache and directory controllers, which allow significant overlap of processing time with the communication time due to compulsory misses. Results from the simulated execution of simple algorithms such as the matrix-matrix multiplication on the SOME-Bus show that block capture and prefetch combined with an effective block replacement policy succeed in significantly reducing the miss rate due to compulsory misses as the cache size increases, while a similar increase of cache size in traditional architectures leaves the miss rate (due to compulsory misses) unaffected.
机译:高端计算的研究已为社会带来了巨大的好处。尽管新的数据和计算密集型应用程序一直在出现,但有证据表明,当前的可伸缩并行体系结构可能不太适合这些应用程序。为了实现千万亿次浮点运算,需要在硬件技术,体系结构,系统软件和编程环境方面取得进步。由于光纤和VLSI技术的进步,允许多个同时广播的互连网络变得可行。同步光学多处理器交换总线(SOME-Bus)是一种低延迟高带宽光纤网络,其独特之处在于每个处理器都通过专用的广播/输出通道直接连接到另一个处理器。本文提出了SOME-Bus的多处理器体系结构,并使用消息传递和分布式共享内存范例研究了用于矩阵运算和排序的代表性算法的性能。它表明,对网络接口以及缓存和目录控制器的简单增强可以极大地提高性能。例如,使用DSM将矩阵向量乘法算法的通信时间缩短为 (1)。现有的并行循环方案已得到扩展,以使其适用于正在研究的高端系统。研究了现有并行软件到系统的有效映射。在针对系统开发的模拟器上实施,测试和评估软件的性能。本文还提出了对网络接口以及缓存和目录控制器的增强,由于强制丢失,使得处理时间与通信时间显着重叠。在SOME-Bus上矩阵矩阵乘法之类的简单算法的仿真执行结果表明,随着高速缓存大小的增加,强制捕获会导致块捕获和预取与有效的块替换策略相结合,从而成功地显着降低了未命中率,传统架构中缓存大小的类似增加使得未命中率(由于强制性未命中)不受影响。

著录项

  • 作者

    Narravula, Harsha V.;

  • 作者单位

    Drexel University.;

  • 授予单位 Drexel University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 92 p.
  • 总页数 92
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号