首页> 外文会议>24th ACM international conference on supercomputing 2010 >Making Nested Parallel Transactions Practical using Lightweight Hardware Support
【24h】

Making Nested Parallel Transactions Practical using Lightweight Hardware Support

机译:使用轻量级硬件支持使嵌套的并行事务实用

获取原文
获取原文并翻译 | 示例

摘要

Transactional Memory (TM) simplifies parallel programming by supporting parallel tasks that execute in an atomic and isolated way. To achieve the best possible performance, TM must support the nested parallelism available in real-world applications and supported by popular programming models. A few recent papers have proposed support for nested parallelism in software TM (STM) and hardware TM (HTM). However, the proposed designs are still impractical, as they either introduce excessive runtime overheads or require complex hardware structures.rnThis paper presents filter-accelerated, nested TM (FaNTM). We extend a hybrid TM based on hardware signatures to provide practical support for nested parallel transactions. In the FaNTM design, hardware filters provide continuous and nesting-aware conflict detection, which effectively eliminates the excessive overheads of software nested transactions. In contrast to a full HTM approach, FaNTM simplifies hardware by decoupling nested parallel transactions from caches using hardware filters. We also describe subtle correctness and liveness issues that do not exist in the non-nested baseline TM.rnWe quantify the performance of FaNTM using STAMP applications and microbenchmarks that use concurrent data structures. First, we demonstrate that the runtime overhead of FaNTM is small (2.3% on average) when applications use only single-level parallelism. Second, we show that the incremental performance overhead of FaNTM is reasonable when the available parallelism is used in deeper nesting levels. We also demonstrate that nested parallel transactions on FaNTM run significantly faster (e.g., 12.4x) than those on a nested STM. Finally, we show how nested parallelism is used to improve the overall performance of a transactional microbenchmark.
机译:事务性内存(TM)通过支持以原子和隔离方式执行的并行任务来简化并行编程。为了获得最佳性能,TM必须支持现实应用程序中可用的嵌套并行性,并受流行的编程模型的支持。最近的一些论文提出了在软件TM(STM)和硬件TM(HTM)中支持嵌套并行性。但是,提出的设计仍然不切实际,因为它们会引入过多的运行时开销或需要复杂的硬件结构。本文介绍了过滤器加速的嵌套TM(FaNTM)。我们扩展了基于硬件签名的混合TM,以为嵌套并行事务提供实际支持。在FaNTM设计中,硬件过滤器提供连续且可感知嵌套的冲突检测,从而有效消除了软件嵌套事务的过多开销。与完整的HTM方法相比,FaNTM通过使用硬件过滤器将嵌套的并行事务与缓存分离,从而简化了硬件。我们还描述了非嵌套基线TM中不存在的细微正确性和活跃性问题。我们使用STAMP应用程序和使用并发数据结构的微基准来量化FaNTM的性能。首先,我们证明当应用程序仅使用单级并行性时,FaNTM的运行时开销很小(平均为2.3%)。其次,我们表明,当在更深的嵌套级别中使用可用的并行性时,FaNTM的增量性能开销是合理的。我们还演示了FaNTM上的嵌套并行事务比嵌套STM上的嵌套事务运行速度明显快(例如12.4倍)。最后,我们展示了如何使用嵌套并行性来提高事务性微基准测试的整体性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号