HPC Application Address Stream Compression, Replay and Scaling.

机译：HPC应用程序地址流压缩，重播和缩放。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the capabilities of high performance computing (HPC) resources have grown over the last decades, a performance gap has developed and expanded between the processor and memory. Processor speeds have improved according to Moore's law, while memory bandwidth has lagged behind. The performance bottleneck created by this gap, termed the "Von Neuman bottleneck," has been the driving force behind the development of modern memory subsystems.;Many advances have been made aimed at hiding this memory bottleneck. Multi-level cache structures with a variety of implementation policies have been introduced. Memory subsystems have become very complex and the effectiveness of their structure and policies vary according the the behavior of the application running on the resource.;Memory simulation studies aid in the design of memory subsystems and in acquisition decisions. During a typical acquisition, candidate resources are evaluated to determine their appropriateness for a pre-defined workload. Simulation-aided models provide performance predictions when the hardware is not available for full testing ahead of purchase. However, address streams of full applications may be too large for direct use, complicating memory subsystem simulation.;Memory address streams are extremely large. They can grow at a rate of over 2.6 TB/hour per core. HPC workloads contain applications that run for days across hundreds of processors, generating address streams whose handling is intractable. However, the memory address streams contain a wealth of information about the behavior of applications, that is largely inaccessible.;This work describes a novel compression technique, specifically designed to make the information within HPC application address streams accessible and manageable. This compression method has several advantages over previous methods: extremely high compression rates, low overhead, and a human readable format. These attributes of the compression technique enable further, previously problematic, studies.;High compression ratios are a necessity for application address streams. Address streams are very large, making them challenging to collect and store. Furthermore, any simulation experiment performed using the stream will be limited by disk speeds, since there is no other plausible place to store and retrieve such volumes of data. The compression technique presented has demonstrated compression ratios in the hundreds of thousands of times. This leads to file sizes that can easily be emailed between collaborators and the format can be replayed at least as fast as disk speeds. The collection overhead for an address stream must be low.;The collection takes place on an HPC resource, and HPC resource time is costly. This compression technique has an unsampled average slowdown of 90X. This slowdown is an improvement of the state-of-the-art.;The compressed address stream profiles are human readable. This attribute enables new and interesting uses of application address streams. It is possible to experiment with hypothetical code optimizations using simulation or other metrics rather than actually implement the optimizations.;Strong scaling analysis of memory behavior is historically challenging. High-level metrics such as execution time and cache miss rates do not lend well to strong scaling studies because they hide the true complexity of the application-machine interactions. This work includes a strong scaling analysis in order to demonstrate the advanced capabilities that can be built upon this compression technique.

机译：在过去的几十年中，随着高性能计算（HPC）资源的功能不断增长，处理器与内存之间的性能差距已经扩大并扩大。根据摩尔定律，处理器速度有所提高，而内存带宽却落后了。这种差距造成的性能瓶颈被称为“冯·诺伊曼瓶颈”，一直是现代内存子系统发展的动力。;为隐藏此内存瓶颈已取得了许多进步。引入了具有各种实施策略的多级缓存结构。内存子系统已经变得非常复杂，其结构和策略的有效性根据在资源上运行的应用程序的行为而有所不同。内存仿真研究有助于内存子系统的设计和获取决策。在典型的获取过程中，将对候选资源进行评估，以确定其是否适合预定义的工作负载。当硬件在购买前无法进行全面测试时，仿真辅助模型可以提供性能预测。但是，完整应用程序的地址流可能太大，无法直接使用，这使内存子系统仿真变得复杂。；内存地址流非常大。每个核心的增长率可以超过2.6 TB /小时。 HPC工作负载包含可在数百个处理器上运行数天的应用程序，从而生成难以处理的地址流。但是，内存地址流包含大量有关应用程序行为的信息，这在很大程度上是不可访问的。这项工作描述了一种新颖的压缩技术，该压缩技术专门设计用于使HPC应用程序地址流中的信息可访问和可管理。与以前的方法相比，此压缩方法具有几个优点：极高的压缩率，较低的开销和易于阅读的格式。压缩技术的这些属性使以前有问题的研究得以进一步进行。高压缩比是应用程序地址流所必需的。地址流非常大，使其很难收集和存储。此外，使用该流执行的任何模拟实验都将受到磁盘速度的限制，因为没有其他合理的位置可以存储和检索此类数据。提出的压缩技术已证明了数十万次的压缩率。这样就可以轻松地通过协作者之间的电子邮件发送文件大小，并且可以至少与磁盘速度一样快地重放该格式。地址流的收集开销必须很低。;收集是在HPC资源上进行的，并且HPC资源时间很昂贵。此压缩技术的未采样平均速度降低了90倍。这种减慢是对现有技术的改进。压缩的地址流配置文件是人类可读的。此属性允许对应用程序地址流进行新的有趣的使用。可以使用模拟或其他度量标准对假设的代码优化进行实验，而不是实际实施优化。内存行为的严格缩放分析在历史上一直具有挑战性。执行时间和缓存未命中率之类的高级指标不能很好地进行强大的扩展研究，因为它们掩盖了应用程序与机器交互的真正复杂性。这项工作包括一个强大的缩放分析，以演示可以基于此压缩技术构建的高级功能。

著录项

作者
Olschanowsky, Catherine Rose Mills.;
展开▼
作者单位

University of California, San Diego.;

展开▼
授予单位 University of California, San Diego.;
学科 Computer Science.
学位 Ph.D.
年度 2011
页码 120 p.
总页数 120
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Application research of image compression and wireless network traffic video streaming [J] . Zhang Ge, Wang Jianlin, Yan Chaokun, Journal of visual communication & image representation . 2019,第FEBa期

机译：图像压缩与无线网络流量视频流的应用研究
2. Application research of image compression and wireless network traffic video streaming [J] . Zhang Ge, Wang Jianlin, Yan Chaokun, Journal of visual communication & image representation . 2019,第Feba期

机译：图像压缩和无线网络流量视频流的应用研究
3. Tailored data compression using stream partitioning and prediction: application to Gaia [J] . Portell J, Garcia-Berro E, Luri X, Experimental astronomy . 2006,第3期

机译：使用流分区和预测进行量身定制的数据压缩：应用于Gaia
4. Clock delta compression for scalable order-replay of non-deterministic parallel applications [C] . Kento Sato, Dong H. Ahn, Ignacio Laguna, International Conference for High Performance Computing, Networking, Storage and Analysis . 2015

机译：时钟增量压缩可用于不确定性并行应用程序的可扩展顺序重放
5. Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems [D] . Hashmi, Jahanzeb Maqbool. 2020

机译：为下一代HPC系统设计高性能共享地址空间和自适应通信横向
6. Replay and Time Compression of Recurring Spike Sequences in the Hippocampus [O] . Zoltán Nádasdy, Hajime Hirase, András Czurkó, 1999

机译：海马重复性穗序列的重播和时间压缩
7. Accurate memory signatures and synthetic address traces for hpc applications [O] . Jonathan Weinberg, Allan E. Snavely 2008

机译：适用于hpc应用程序的精确内存签名和合成地址跟踪

HPC Application Address Stream Compression, Replay and Scaling.

摘要

著录项

相似文献

相关主题

期刊订阅