首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems
【24h】

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems

机译:利用PARSEC运行时支持来解决具有挑战性的3D数据稀疏矩阵问题

获取原文

摘要

The task-based programming model associated with dynamic runtime systems has gained popularity for challenging problems because of workload imbalance, heterogeneous resources, or extreme concurrency. During the last decade, low-rank matrix approximations—where the main idea consists of exploiting data sparsity, typically by compressing off-diagonal tiles up to an application-specific accuracy threshold—have been adopted to address the curse of dimensionality at extreme scale. In this paper, we create a bridge between the runtime and the linear algebra by communicating knowledge of the data sparsity to the runtime. We design and implement this synergistic approach with high user productivity in mind, in the context of the PaRSEC runtime system and the HiCMA numerical library. This requires extending PaRSEC with new features to integrate rank information into the dataflow so that proper decisions can be made at runtime. We focus on the tile low-rank (TLR) Cholesky factorization for solving 3D data-sparse covariance matrix problems arising in environmental applications. In particular, we employ the 3D exponential model of the Mateŕn matrix kernel, which exhibits challenging nonuniform high ranks in off-diagonal tiles. We first provide dynamic data structure management driven by a performance model to reduce extra floating-point operations. Next, we optimize the memory footprint of the application by relying on a dynamic memory allocator, and supported by a rank-aware data distribution to cope with the workload imbalance. Finally, we expose further parallelism using kernel recursive formulations to shorten the critical path. Our resulting high-performance implementation outperforms existing data-sparse TLR Cholesky factorization by up to 7-fold on a large-scale distributed-memory system, while minimizing the memory footprint up to a 44-fold factor. This multidisciplinary work highlights the need to empower runtime systems beyond their original duty of task scheduling for servicing next-generation low-rank matrix algebra libraries.
机译:由于工作负载不平衡,异构资源或极端并发,与动态运行时系统相关的基于任务的编程模型对挑战性问题产生了普及。在过去十年中,低秩矩阵近似 - 其中主要思想包括利用数据稀疏性,通常通过压缩偏离对角线块,以便在极度范围内地解决维度的维度诅咒。在本文中,我们通过将数据稀疏性的知识传送到运行时,在运行时和线性代数之间创建一个桥梁。我们在Parsec运行时系统和HICMA数值库的上下文中设计和实施具有高用户生产力的协同方法和实现这种协同方法。这需要将Parsec扩展为新功能,将秩信息集成到数据流中,以便在运行时可以进行正确的决策。我们专注于瓷砖低级(TLR)弦孔因分解,以解决环境应用中出现的3D数据稀疏协方差矩阵问题。特别是,我们采用Mateŕn矩阵内核的3D指数模型,其展示了挑战的非均匀高排位数在偏差瓦片中。我们首先提供由性能模型驱动的动态数据结构管理,以减少额外的浮点操作。接下来,我们通过依赖于动态内存分配器来优化应用程序的内存占用空间,并由秩的数据分发支持以应对工作负载不平衡。最后,我们使用核递归制剂缩短临界路径的进一步并行性。我们所产生的高性能实现优于现有的数据稀疏的TLR孔孔孔,在大型分布式存储系统上最多7倍,同时最大限度地减少了44倍因子的内存占用空间。此多学科工作突出了授权运行时系统,超出其原始任务调度的原始职责,以便为下一代低级矩阵代数库提供服务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号