首页> 外文OA文献 >Scalable Multithreaded Algorithms for Mutable Irregular Data with Application to Anisotropic Mesh Adaptivity
【2h】

Scalable Multithreaded Algorithms for Mutable Irregular Data with Application to Anisotropic Mesh Adaptivity

机译:可变不规则数据的可扩展多线程算法及其在各向异性网格适应性中的应用

摘要

Anisotropic mesh adaptation is a powerful way to directly minimise the computational cost of mesh based simulation. It is particularly important for multi-scale problems where the required number of floating-point operations can be reduced by orders of magnitude relative to more traditional static mesh approaches. Increasingly, finite element/volume codes are being optimised for modern multicore architectures. Inter-node parallelism for mesh adaptivity has been successfully implemented by a number of groups using domain decomposition methods. However, thread-level parallelism using programming models such as OpenMP is significantly more challenging because the underlying data structures are extensively modified during mesh adaptation and a greater degree of parallelism must be realised while keeping the code race-free.ududIn this thesis we describe a new thread-parallel implementation of four anisotropic mesh adaptation algorithms, namely edge coarsening, element refinement, edge swapping and vertex smoothing. For each of the mesh optimisation phases we describe how safe parallel execution is guaranteed by processing workitems in batches of independent sets and using a deferred-operations strategy to update the mesh data structures in parallel without data contention. Scalable execution is further assisted by creating worklists using atomic operations, which provides a synchronisation-free alternative to reduction-based worklist algorithms. Additionally, we compare graph colouring methods for the creation of independent sets and present an improved version which can run up to 50% faster than existing techniques. Finally, we describe some early work on an interrupt-driven work-sharing for-loop scheduler which is shown to perform better than existing work-stealing schedulers.ududCombining all aforementioned novel techniques, which are generally applicable to other unordered irregular problems, we show that despite the complex nature of mesh adaptation and inherent load imbalances, we achieve a parallel efficiency of 60% on an 8-core Intel(R) Xeon(R) Sandy Bridge and 40% using 16 cores on a dual-socket Intel(R) Xeon(R) Sandy Bridge ccNUMA system.
机译:各向异性网格自适应是直接最小化基于网格的仿真的计算成本的有效方法。对于多尺度问题而言,这一点尤其重要,因为相对于更传统的静态网格方法,所需的浮点运算数量可以减少几个数量级。有限元/体积代码越来越多地被优化用于现代多核架构。多个节点使用域分解方法已成功实现了节点之间的网格自适应并行性。但是,使用诸如OpenMP这样的编程模型的线程级并行性要更具挑战性,因为在网格自适应过程中对底层数据结构进行了广泛修改,并且必须在不使代码竞争的情况下实现更高程度的并行性。 ud ud我们描述了四种各向异性网格自适应算法的新线程并行实现,即边缘粗化,元素细化,边缘交换和顶点平滑。对于每个网格优化阶段,我们都描述了如何通过批量处理独立集合中的工作项并使用延迟操作策略来并行更新网格数据结构而无数据争用的方式来保证并行执行的安全性。通过使用原子操作创建工作列表,进一步有助于可伸缩执行,这为基于缩减的工作列表算法提供了无同步的替代方法。此外,我们比较了图形着色方法以创建独立的集,并提出了一种改进的版本,其运行速度比现有技术快50%。最后,我们描述了有关中断驱动的工作共享for循环调度程序的一些早期工作,该调度程序表现出比现有的工作窃取调度程序更好的性能。 ud ud结合了上述所有新颖的技术,这些技术通常适用于其他无序不规则问题,我们证明,尽管网状网适应和固有的负载不平衡具有复杂的性质,但在8核Intel®Xeon®Sandy Bridge上,并行效率达到60%,而在双插槽上使用16核,并行效率达到40%。英特尔®至强®Sandy Bridge ccNUMA系统。

著录项

  • 作者

    Rokos Georgios;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号