首页> 外文会议>Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on >Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications
【24h】

Speedup stacks: Identifying scaling bottlenecks in multi-threaded applications

机译:加速堆栈:确定多线程应用程序中的扩展瓶颈

获取原文
获取原文并翻译 | 示例

摘要

Multi-threaded workloads typically show sublinear speedup on multi-core hardware, i.e., the achieved speedup is not proportional to the number of cores and threads. Sublinear scaling may have multiple causes, such as poorly scalable synchronization leading to spinning and/or yielding, and interference in shared resources such as the last-level cache (LLC) as well as the main memory subsystem. It is vital for programmers and processor designers to understand scaling bottlenecks in existing and emerging workloads in order to optimize application performance and design future hardware. In this paper, we propose the speedup stack, which quantifies the impact of the various scaling delimiters on multi-threaded application speedup in a single stack. We describe a mechanism for computing speedup stacks on a multi-core processor, and we find speedup stacks to be accurate within 5.1% on average for sixteen-threaded applications. We present several use cases: we discuss how speedup stacks can be used to identify scaling bottlenecks, classify benchmarks, optimize performance, and understand LLC performance.
机译:多线程工作负载通常在多核硬件上显示出次线性加速,即,实现的加速与内核和线程数不成比例。子线性缩放可能有多种原因,例如可伸缩性差的同步导致旋转和/或屈服,以及共享资源(例如最后一级缓存(LLC)和主内存子系统)中的干扰。对于程序员和处理器设计人员而言,了解现有和新兴工作负载的扩展瓶颈对于优化应用程序性能和设计未来硬件至关重要。在本文中,我们提出了加速堆栈,该堆栈量化了各种缩放定界符对单个堆栈中多线程应用程序加速的影响。我们描述了一种在多核处理器上计算加速堆栈的机制,发现十六线程应用程序的加速堆栈平均准确度在5.1%以内。我们提出了几个用例:我们讨论了如何使用加速堆栈来识别扩展瓶颈,对基准进行分类,优化性能以及了解LLC性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号