首页> 外文OA文献 >Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems
【2h】

Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems

机译:具有显式的混合并行稀疏矩阵向量乘法   当前基于多核的系统的通信重叠

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We evaluate optimized parallel sparse matrix-vector operations for severalrepresentative application areas on widespread multicore-based clusterconfigurations. First the single-socket baseline performance is analyzed andmodeled with respect to basic architectural properties of standard multicorechips. Beyond the single node, the performance of parallel sparse matrix-vectoroperations is often limited by communication overhead. Starting from theobservation that nonblocking MPI is not able to hide communication cost usingstandard MPI implementations, we demonstrate that explicit overlap ofcommunication and computation can be achieved by using a dedicatedcommunication thread, which may run on a virtual core. Moreover we identifyperformance benefits of hybrid MPI/OpenMP programming due to improved loadbalancing even without explicit communication overlap. We compare performanceresults for pure MPI, the widely used "vector-like" hybrid programmingstrategies, and explicit overlap on a modern multicore-based cluster and a CrayXE6 system.
机译:我们在广泛的基于多核的群集配置上评估了几个代表性应用领域的优化并行稀疏矩阵矢量运算。首先,针对标准多核芯片的基本架构特性对单路基准性能进行分析和建模。除了单个节点之外,并行稀疏矩阵矢量运算的性能通常受到通信开销的限制。从观察到无阻塞MPI无法使用标准MPI实现隐藏通信成本开始,我们证明了使用专用通信线程可以实现通信和计算的显式重叠,该线程可以在虚拟内核上运行。此外,即使没有明确的通信重叠,我们也可以通过改善负载平衡来确定MPI / OpenMP混合编程的性能优势。我们比较了纯MPI,广泛使用的“矢量样”混合编程策略以及在基于现代多核的群集和CrayXE6系统上的显式重叠的性能结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号