首页> 外文期刊>Concurrency and Computation >Guided installation of basic linear algebra routines in a clusterrnwith manycore components
【24h】

Guided installation of basic linear algebra routines in a clusterrnwith manycore components

机译:在具有许多核心组件的集群中引导安装基本线性代数例程

获取原文
获取原文并翻译 | 示例

摘要

Computational systems are nowadays composed of basic computational components that sharernmultiprocessors and coprocessors of different types, typically several graphics processing unitsrn(GPUs) or many integrated cores (MICs), and those computational components are combined inrnheterogeneous clusters of nodes with different characteristics, including coprocessors of differentrntypes, with varying numbers of nodes at different speeds. The software previously developedrnand optimized for simpler system needs to be redesigned and reoptimized for these new, morerncomplex systems. The adaptation to hybrid multicore + multiGPU and multicore + multiMIC ofrnautotuning techniques for basic linear algebra routines is analyzed. The matrix-matrixmultiplicationrnkernel, which is optimized for different computational system components through guidedrnexperimentation, is studied. The routine is installed for each node in the cluster, and the informationrngenerated from individual installations may be used for a hierarchical installation in a cluster.rnThe basic matrix-matrix multiplication may, in turn, be used inside higher level routines, whichrndelegate their efficient execution to the optimization of the lower level routine. Experimentalrnresults are satisfactory in different multicore + multiGPU and multicore + multiMIC systems. Sornthe guided search of execution configurations for satisfactory execution times proves to be a usefulrntool for heterogeneous systems, where the complexity of the system means a correct use ofrnhighly efficient routines and libraries is difficult.
机译:如今,计算系统由基本的计算组件组成,它们共享不同类型的多处理器和协处理器,通常是几个图形处理单元(GPU)或许多集成核(MIC),这些计算组件是由具有不同特征的节点的异构集群组合而成,包括不同的类型,以不同的速度具有不同数量的节点。对于这些新的,更复杂的系统,需要重新设计和优化先前为简化系统而开发和优化的软件。分析了基本线性代数例程对混合多核+ multiGPU和多核+ multiMIC自动调节技术的适应性。研究了通过导引实验针对不同计算系统组件进行优化的矩阵矩阵乘法内核。例程是为群集中的每个节点安装的,从单个安装生成的信息可以用于群集中的分层安装。基本矩阵矩阵乘法又可以在较高级别的例程中使用,这表明它们的有效执行对较低级别例程的优化。在不同的多核+多GPU和多核+ multiMIC系统中,实验结果令人满意。对于令人满意的执行时间,进行引导式搜索执行配置证明是异构系统的有用工具,在异构系统中,系统的复杂性意味着很难正确使用高效的例程和库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号