首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >STS-k: a multilevel sparse triangular solution scheme for NUMA multicores
【24h】

STS-k: a multilevel sparse triangular solution scheme for NUMA multicores

机译:STS-k:用于NUMA多核的多级稀疏三角求解方案

获取原文

摘要

We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD `MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.
机译:我们考虑了通过为单核多处理器扩展较早的着色和级别设置方案来提高非均匀内存体系结构多核上并行稀疏三角解决方案性能的技术。我们开发了STS-k,其中k表示少量转换,以减少数据访问的时空局部性,从而减少延迟。我们提出了一种数据重用的图形模型,以告知STS-k的发展,并证明计算最佳成本计划是NP完整的。我们观察到在32核Intel Westmere-Ex和24核AMD“ MagnyCours”处理器上使用STS-3可以显着提高速度。对于固定顺序,仅从STS-3的3级转换中获得的增量收益,对应于水平集的执行时间减少了1.4(Intel)和1.5(AMD),而对应的执行时间减少了2(Intel)和2.2(AMD)。染色。与使用级别集的参考实现相比,使用着色的STS-3平均将执行时间减少6(Intel)和4(AMD)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号