首页> 外文学位 >Measuring, Modeling, and Optimizing Counterintuitive Performance Phenomena in Power-Scalable, Parallel Systems.

【24h】

Measuring, Modeling, and Optimizing Counterintuitive Performance Phenomena in Power-Scalable, Parallel Systems.

机译：在功率可扩展并行系统中测量，建模和优化违反直觉的性能现象。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The demands of exascale computing systems and applications have pushed for a rapid, continual design paradigm coupled with increasing design complexities from the interaction between the application, the middleware, and the underlying system hardware, which forms a breeding ground for inefficiency. This work seeks to improve system efficiency by exposing the root causes of unexpected performance slowdowns (e.g., lower performance at higher processor speeds) that occur more frequently in power-scalable systems where raw processor speed varies. More precisely, we perform an exhaustive empirical study that conclusively shows that increasing processor speed often reduces performance and wastes energy. Our experimental work shows that the frequency of occurrence and magnitude of slowdowns grow with clock frequency and parallelism, indicating that such slowdowns will increasingly be observed with trends in processor and system design.;Performance speedups at lower frequencies (or slowdowns at higher frequencies) have been anecdotally observed in the prevailing literature since 2004, but no research has explained nor exploited this phenomenon. This work conclusively demonstrates that performance slowdowns during processor speedup phases can exceed 47% in common I/O workloads. Our hypothesis challenges (and ultimately debunks) a fundamental assumption in computer systems: faster processor speeds result in the same or better performance.;In this work, with the use of code and kernel instrumentation, exhaustive experiments, and deep insight into the inner workings of the Linux I/O subsystem, I overcome the aforementioned challenges of variance, complexity, and nondeterminism and identify the I/O resource contention as the root cause of the slowdowns during processor speedup. Specifically, such contention comes from the Linux kernel when the journaling block device (JBD) interacts with the ext3/4 file system that introduces file write delays and file synchronization delays. To fully explain how such I/O contention causes performance anomaly, I propose analytical models of resource contention among I/O threads to describe the root cause of the observed I/O slowdowns when processors speed up. To this end, I introduce LUC, a runtime system to limit the unintended consequences of power scaling and demonstrate the effectiveness of the LUC system for two critical parallel transaction-oriented workloads, including a mail server (varMail) and online transaction processing (oltp).

机译：百亿亿次计算系统和应用程序的需求推动了一种快速，连续的设计范例，而应用程序，中间件和底层系统硬件之间的交互作用则导致设计复杂性不断提高，这为低效率提供了温床。这项工作试图通过揭示在原始处理器速度变化的可扩展电源的系统中更频繁发生的意外性能下降（例如，较高的处理器速度下的性能降低）的根本原因来提高系统效率。更准确地说，我们进行了详尽的经验研究，最终表明，提高处理器速度通常会降低性能并浪费能源。我们的实验工作表明，出现频率和减慢幅度随时钟频率和并行度而增加，这表明随着处理器和系统设计趋势的发展，这种减慢将越来越多地被观察到;较低频率下的性能提速（或较高频率下的性能提速）有自2004年以来，在主流文献中都曾观察到这种现象，但是没有任何研究对这种现象进行解释或利用。这项工作最终证明，在处理器加速阶段，性能下降可能超过普通I / O工作负载的47％。我们的假设挑战（并最终揭穿了）计算机系统中的一个基本假设：更快的处理器速度导致相同或更好的性能。;在这项工作中，通过使用代码和内核工具，详尽的实验以及对内部工作的深入了解在Linux I / O子系统的基础上，我克服了上述变化，复杂性和不确定性的挑战，并将I / O资源争用确定为处理器加速期间速度下降的根本原因。具体来说，当日志记录块设备（JBD）与ext3 / 4文件系统交互时，这种争用来自Linux内核，后者会引入文件写入延迟和文件同步延迟。为了充分说明这种I / O争用如何导致性能异常，我提出了I / O线程之间的资源争用分析模型，以描述处理器加速时观察到的I / O速度降低的根本原因。为此，我介绍了LUC，它是一种运行时系统，旨在限制电源扩展的意外后果，并演示LUC系统对于两个关键的并行事务处理工作负载（包括邮件服务器（varMail）和在线事务处理（oltp））的有效性。

著录项

作者
Chang, Hung-Ching.;
展开▼
作者单位

Virginia Polytechnic Institute and State University.;

展开▼
授予单位 Virginia Polytechnic Institute and State University.;
学科 Computer science.
学位 Ph.D.
年度 2015
页码 197 p.
总页数 197
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Analysis & Integrated Modeling of the Performance Evaluation Techniques for Evaluating Parallel Systems. [J] . Amit Chhabra, Gurvinder Singh International Journal of Computer Science and Security . 2007,第1期

机译：并行系统性能评估技术的分析和集成建模。
2. Parallelization and performance optimization of a dynamic PDE fixed bed reactor model for practical applications [J] . Havard Lindborg, Vegard Eide, Steffen Unger, Computers & Chemical Engineering . 2004,第9期

机译：实际应用中动态PDE固定床反应器模型的并行化和性能优化
3. Measuring Parallel Performance: Optimizing A Concurrent Queue [J] . Herb Sutter Dr. Dobb's Journal . 2009,第1期

机译：衡量并行性能：优化并发队列
4. Measuring domain decomposition effect in estuary model parallelization using high performance computer [C] . Putra Santosa Sandy, Perez Gerald Agusto Corzo, van der Pijl Sander, International Conference on Information Technology and Electrical Engineering . 2014

机译：使用高性能计算机测量河口模型并行化中的区域分解效果
5. Parallel I/O Optimizations Through Request Delegation for High-Performance Computing Systems. [D] . Nisar, Arifa. 2010

机译：通过高性能计算系统的请求委派进行并行I / O优化。
6. Transforming Lindblad Equations into Systems of Real-Valued Linear Equations: Performance Optimization and Parallelization of an Algorithm [O] . Iosif Meyerov, Evgeny Kozinov, Alexey Liniov, 2020

机译：将Lindblad方程转换为实值线性方程的系统：算法的性能优化和并行化
7. Parallelization, Optimization, and Performance Analysis of Portfolio Choice Models [O] . Ahmed Abdelkhalek And, Ahmed Abdelkhalek, Angelos Bilas, 2001

机译：投资组合选择模型的并行化，优化和绩效分析
8. Performance Measures of Parallel Digital Signal Processor Systems. [R] . Harris, J. D. 1992

机译：并行数字信号处理系统的性能测量。

Measuring, Modeling, and Optimizing Counterintuitive Performance Phenomena in Power-Scalable, Parallel Systems.

摘要

著录项

相似文献

相关主题

期刊订阅