首页> 外文OA文献 >Dynamic Machine Level Resource Allocation to Improve Tasking Performance Across Multiple Processes
【2h】

Dynamic Machine Level Resource Allocation to Improve Tasking Performance Across Multiple Processes

机译:动态机器级资源分配,以提高跨多个进程的任务处理性能

摘要

Across the landscape of computing, parallelism within applications is increasingly important in order to track advances in hardware capability and meet critical performance metrics. However, writing parallel applications is difficult to do in a scalable way, which has led to the creation of tasking libraries and language extensions like OpenMP, Intel Threading Building Blocks, Qthreads, and more. These tools abstract parallel execution by expressing it in terms of work units (tasks) rather than specific hardware details. This abstraction enables scaling and allows programmers to write software solutions that can leverage whatever level of parallelism is available.However, the typical task scheduler is greedy and naïve. Thus, concurrent parallel processes compete for computational resources, which results in unnecessary context switches, mis-timed synchronization, unnecessary resource contention, and the associated consequences. By providing a mechanism of communication between the task schedulers, processes can cooperate to more effectively utilize hardware and avoid the negative consequences of coarse-grained resource contention. This work uses Qthreads to demonstrate that cooperative allocation of computational resources reduces contention and decreases execution time. The overhead added for the resource allocation is shown to have minimal impact. Using the Unbalanced Tree Search (UTS) and High Performance Conjugate Gradient (HPCG) benchmarks, execution time across concurrent processes shows significant decreases across a range of machines running a variety of hardware resources and software configurations. Tests also indicate that dynamic compute-resource allocation provides a clear performance benefit even when hardware resources are oversubscribed: when there are more processes than processing units. UTS tests saw an average of 4.98% reduction in execution time in Linux compared to Qthreadu27s yielding option and an 89.32% reduction in execution time in Apple OS X. HPCG resulted in partitioning reducing execution time by an average of 22.31% compared to the default Qthreads configuration across all test platforms.
机译:在整个计算领域,为了跟踪硬件功能的进步并达到关键的性能指标,应用程序中的并行性变得越来越重要。但是,很难以可扩展的方式编写并行应用程序,这导致创建任务库和语言扩展,例如OpenMP,Intel Threading Building Blocks,Qthreads等。这些工具通过根据工作单位(任务)而不是特定的硬件细节来表达并行执行来抽象化并行执行。这种抽象可以扩展并允许程序员编写可以利用任何可用并行度的软件解决方案。但是,典型的任务调度程序既贪婪又幼稚。因此,并发并行进程竞争计算资源,这导致不必要的上下文切换,错误的同步时间,不必要的资源争用以及相关的后果。通过提供任务调度程序之间的通信机制,进程可以进行协作以更有效地利用硬件,并避免了粗粒度资源争用的负面影响。这项工作使用Qthreads来演示计算资源的协作分配减少了争用并减少了执行时间。显示为资源分配增加的开销影响最小。使用不平衡树搜索(UTS)和高性能共轭梯度(HPCG)基准,跨并发进程的执行时间表明,运行各种硬件资源和软件配置的一系列机器的执行时间显着减少。测试还表明,即使硬件资源被超额订购,动态的计算资源分配也可以带来明显的性能优势:进程数多于处理单元数。与Qthread的生产选项相比,UTS测试显示Linux上的执行时间平均减少了4.98%,Apple OS X中的执行时间减少了89.32%。HPCG导致分区操作,与Qthread相比,平均减少了22.31%。所有测试平台上的默认Qthreads配置。

著录项

  • 作者

    Thatcher Richard Walter;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号