首页> 外文期刊>Concurrency and computation: practice and experience >KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs
【24h】

KernelHive: a new workflow-based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs

机译:KernelHive:基于工作流的新框架,用于使用具有CPU和GPU的集群和工作站进行多层高性能计算

获取原文
获取原文并翻译 | 示例
           

摘要

The paper presents a new open-source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy-to-use GUI is provided for modeling and monitoring with a repository of ready-to-use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password-breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.
机译:本文提出了一个称为KernelHive的新开源框架,用于在各种集群,集群节点之间以及最终在特定应用的CPU和GPU之间对计算进行多级并行化。应用程序被建模为非循环有向图,并可能根据可用的计算单元数量并行运行节点并自动扩展节点(称为节点展开)。提出了一种用于将应用程序并行化和映射到环境的方法,该方法包括使用选定的优化器选择设备,选择计算设备的最佳网格配置,数据分区和执行的优化。考虑执行时间,功耗等,可以选择许多调度算法之一。提供了一个易于使用的GUI,用于使用现成的结构和计算内核的存储库进行建模和监视。已针对在异构环境中运行的分布式并行密码破解示例演示了方法,执行时间和可伸缩性,该异构环境具有群集和具有不同数量节点以及CPU和GPU的服务器。此外,该框架的性能已与使用并行地理空间插值应用程序的MPI + OpenCL实施进行了比较,该应用程序采用了多达40个群集节点和320个核心。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号