首页> 外文会议>IEEE International Symposium on Parallel Distributed Processing >Performance evaluation of concurrent collections on high-performance multicore computing systems
【24h】

Performance evaluation of concurrent collections on high-performance multicore computing systems

机译:高性能多核计算系统上并发集合的性能评估

获取原文

摘要

This paper is the first extensive performance study of a recently proposed parallel programming model, called Concurrent Collections (CnC). In CnC, the programmer expresses her computation in terms of application-specific operations, partially-ordered by semantic scheduling constraints. The CnC model is well-suited to expressing asynchronous-parallel algorithms, so we evaluate CnC using two dense linear algebra algorithms in this style for execution on state-of-the-art multicore systems: (i) a recently proposed asynchronous-parallel Cholesky factorization algorithm, (ii) a novel and non-trivial ????higher-level???? partly-asynchronous generalized eigensolver for dense symmetric matrices. Given a well-tuned sequential BLAS, our implementations match or exceed competing multithreaded vendor-tuned codes by up to 2.6????. Our evaluation compares with alternative models, including ScaLAPACK with a shared memory MPI, OpenMP, Cilk++, and PLASMA 2.0, on Intel Harpertown, Nehalem, and AMD Barcelona systems. Looking forward, we identify new opportunities to improve the CnC language and runtime scheduling and execution.
机译:本文是最近提出的并行编程模型的第一个广泛的性能研究,称为并发集合(CNC)。在CNC中,程序员以特定于应用程序的操作表示她的计算,通过语义调度约束部分排序。 CNC模型非常适合表达异步平行算法,因此我们使用这种风格中的两个密集的线性代数算法评估CNC,以执行最先进的多核系统:(i)最近提出的异步平行挑剔分解算法,(ii)一种新颖且非琐碎的????更高级别????偏离对称矩阵的部分异步广义Eigensolver。鉴于良好调整的顺序BLA,我们的实现匹配或超过竞争多线程供应商调谐代码高达2.6 ????我们的评估与替代模型进行了比较,包括缩写内存MPI,OpenMP,Cilk ++和等离子体2.0,在英特尔Harpertown,Nehalem和AMD Barcelona系统上。期待着,我们确定了提高数控语言和运行时调度和执行的新机会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号