首页> 外文会议>10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing >Performance of Windows Multicore Systems on Threading and MPI
【24h】

Performance of Windows Multicore Systems on Threading and MPI

机译:Windows多核系统在线程和MPI上的性能

获取原文
获取原文并翻译 | 示例

摘要

We present performance results on a Windows cluster with up to 768 cores using MPI and two variants of threading ȁ3; CCR and TPL. CCR (Concurrency and Coordination Runtime) presents a message based interface while TPL (Task Parallel Library) allows for loops to be automatically parallelized. MPI is used between the cluster nodes (up to 32) and either threading or MPI for parallelism on the 24 cores of each node. We use a simple matrix multiplication kernel as well as a significant bioinformatics gene clustering application. We find that the two threading models offer similar performance with MPI outperforming both at low levels of parallelism but threading much better when the grain size (problem size per process) is small. We find better performance on Intel compared to AMD on comparable 24 core systems. We develop simple models for the performance of the clustering code.
机译:我们使用MPI和线程two3的两个变体在具有多达768个内核的Windows群集上显示了性能结果。 CCR和TPL。 CCR(并发和协调运行时)表示基于消息的界面,而TPL(任务并行库)允许循环自动并行化。在群集节点(最多32个)之间使用MPI,并且在每个节点的24个内核上使用线程或MPI进行并行处理。我们使用简单的矩阵乘法内核以及重要的生物信息学基因聚类应用程序。我们发现,这两种线程模型在低并行度的情况下都具有与MPI相似的性能,但在粒度(每个进程的问题大小)较小时,线程性能要好得多。与类似的24核系统相比,我们发现Intel的性能优于AMD。我们为集群代码的性能开发了简单的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号