首页> 外文会议>International Conference on Intelligent Data Acquisition and Advanced Computing Systems >Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures

【24h】

Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures

机译：多核和集群架构上批处理模式训练算法的高效并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The experimental research of the parallel batch pattern back propagation training algorithm on the example of recirculation neural network on many-core high performance computing systems is presented in this paper. The choice of recirculation neural network among the multilayer perceptron, recurrent and radial basis neural networks is proved. The model of a recirculation neural network and usual sequential batch pattern algorithm of its training are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The experimental research is fulfilled using the Open MPI, Mvapich and Intel MPI message passing libraries. The results obtained on many-core AMD system and Intel MIC are compared with the results obtained on a cluster system. Our results show that the parallelization efficiency is about 95% on 12 cores located inside one physical AMD processor for the considered minimum and maximum scenarios. The parallelization efficiency is about 70–75% on 48 AMD cores for the minimum and maximum scenarios. These results are higher by 15–36% (depending on the version of MPI library) in comparison with the results obtained on 48 cores of a cluster system. The parallelization efficiency obtained on Intel MIC architecture is surprisingly low, asking for deeper analysis.

机译：本文以多核高性能计算系统上的循环神经网络为例，对并行批处理模式反向传播训练算法进行了实验研究。证明了多层感知器，递归和径向基神经网络中循环神经网络的选择。从理论上描述了循环神经网络的模型及其训练的常规顺序批处理模式算法。给出了批处理模式训练方法的并行版本的算法描述。使用开放MPI，Mvapich和英特尔MPI消息传递库可以完成实验研究。将在多核AMD系统和Intel MIC上获得的结果与在集群系统上获得的结果进行比较。我们的结果表明，在考虑的最小和最大场景下，位于一个物理AMD处理器内的12个内核的并行化效率约为95％。在最小和最大场景下，在48个AMD内核上，并行化效率约为70-75％。与在集群系统的48个内核上获得的结果相比，这些结果要高出15–36％（取决于MPI库的版本）。英特尔MIC架构上获得的并行化效率出奇地低，需要更深入的分析。

著录项

来源
《International Conference on Intelligent Data Acquisition and Advanced Computing Systems 》|2013年|692-698|共7页
会议地点
作者
Turchenko Volodymyr; Bosilca George; Bouteiller Aurelien; Dongarra Jack;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
many-core system; parallel batch pattern training; parallelization efficiency; recirculation neural network;

机译：多核系统并行批处理模式训练并行效率循环神经网络;

相似文献

外文文献
中文文献
专利

1. Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI [J] . Volodymyr Turchenko, Lucio Grandinetti, George Bosilca, Procedia Computer Science . 2010 ,第1期

机译：使用Open MPI提高批处理模式BP训练算法的并行化效率
2. Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture [J] . David Diaz, Francisco Jose Esteban, Pilar Hernandez, Parallel Computing . 2011 ,第4a5期

机译：用于多核架构的生物信息学成对序列比对算法的并行化和优化
3. Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor [J] . He Wei-Jia, Yang Ming-Lin, Wang Wu, Journal of supercomputing . 2021 ,第2期

机译：多级快速多极算法对多核SW26010处理器电磁仿真的高效并行化
4. Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures [C] . Turchenko Volodymyr, Bosilca George, Bouteiller Aurelien, International Conference on Intelligent Data Acquisition and Advanced Computing Systems . 2013

机译：批量模式训练算法在许多核心和集群架构中的高效并行化
5. Massively parallel algorithms for CFD simulation and optimization on heterogeneous many-core architectures. [D] . Duffy, Austen C. 2011

机译：大规模并行算法，用于异构多核架构上的CFD仿真和优化。
6. Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization [O] . Kin-On Cheng, Ngai-Fong Law, Wan-Chi Siu, 2008

机译：使用高效的双聚类算法和并行坐标可视化识别基因表达数据中的连贯模式
7. Efficient Parallelization and Optimization of Protein Sequence Comparison Algorithm on Many-Core Architecture [O] . Ye Xiao-chun, Lin Wei, Fan Dong-rui, 2015

机译：多核架构上蛋白质序列比较算法的高效并行化与优化

Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures

摘要

著录项

相似文献

相关主题

期刊订阅