首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >STS-k: a multilevel sparse triangular solution scheme for NUMA multicores
【24h】

STS-k: a multilevel sparse triangular solution scheme for NUMA multicores

机译:STS-K:Numa多设备的多级稀疏三角形解决方案方案

获取原文

摘要

We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD `MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.
机译:我们考虑通过延长单核多处理器的早期着色和级别设置方案来提高不均匀内存架构多设备对非均匀内存架构多设备的平行稀疏三角溶液的性能的技术。我们开发STS-K,其中K代表了少量的转换,用于从数据访问的增加的空间和时间位置增加到延迟减少。我们提出了一个数据重用的图形模型,以告知STS-K的开发,并证明计算最佳成本计划是NP-Complete。我们在32-Core Intel Westmere-Ex和24核AMD`Magnycours的处理器中观察到STS-3的重大速度。仅来自STS-3中的3级变换的增量增长,用于固定订购,对应于逐级集合和2(英特尔)和2.2(AMD)的1.4(英特尔)和1.5(AMD)的因素的执行时间。染色。平均而言,与使用级别集的参考实现相比,执行时间减少了6(英特尔)和4(AMD),用于与参考实现相比的着色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号