ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines

机译：ZNN - 一种快速且可扩展算法，用于培训多核和多核共享内存机器上的3D卷积网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent's theorem to the task dependency graph implies that linear speedup with the number of processors is attainable within the PRAM model of parallel computation, for wide network architectures. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. We implement the algorithm with a publicly available software package called ZNN. Benchmarking with multi-core CPUs shows that ZNN can attain speedup roughly equal to the number of physical cores. We also show that ZNN can attain over 90× speedup on a many-core CPU (Xeon Phi? Knights Corner). These speedups are achieved for network architectures with widths that are in common use. The task parallelism of the ZNN algorithm is suited to CPUs, while the SIMD parallelism of previous algorithms is compatible with GPUs. Through examples, we show that ZNN can be either faster or slower than certain GPU implementations depending on specifics of the network architecture, kernel sizes, and density and size of the output patch. ZNN may be less costly to develop and maintain, due to the relative ease of general-purpose CPU programming.

机译：卷积网络（Convnets）已成为计算机愿景的流行方法。加速Convnet培训非常重要，这是计算地昂贵的。我们提出了一种基于分解成一组任务的新颖并行算法，其中大部分是卷积或FFT。将Brent的定理应用于任务依赖关系图意味着对于广泛的网络架构，可以在PRAM模型内实现具有处理器数量的线性加速。为了获得真实的共享存储器上的这种性能，我们的算法计算在网络的同一节点上的卷积，其中包含时间途径，以减少缓存未命中，并通过几乎等待的并发方法和缩小时间以减少所花费的时间关键部分。我们使用名为ZnN的公开软件包实现算法。使用多核CPU的基准测试显示ZnN可以获得大致等于物理核心数的加速。我们还表明，ZnN可以在许多核心CPU（Xeon Phi？骑士角）上获得超过90倍的加速。对于具有常用宽度的网络架构，可以实现这些加速度。 ZnN算法的任务并行性适用于CPU，而先前算法的SIMD并行性与GPU兼容。通过示例，我们表明ZnN可以比某些GPU实现更快或更慢，具体取决于网络架构的细节，内核大小和输出补丁的密度和大小的细节。由于通用CPU编程相对容易，ZNN可能更昂贵地开发和维护。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2016年|575p|共11页
会议地点
作者
Aleksandar Zlateski; Kisuk Lee; H. Sebastian Seung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-53;
关键词
Kernel; Image edge detection; Convolutional codes; Transfer functions; Training; Three-dimensional displays; Jacobian matrices;

机译：内核;图像边缘检测;卷积码;转移功能;培训;三维显示器;雅各比矩阵;
入库时间 2022-08-21 04:32:34

相似文献

外文文献
中文文献
专利

1. Scalable training of 3D convolutional networks on multi- and many-cores [J] . Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung Journal of Parallel and Distributed Computing . 2017,第auga期

机译：多核和多核上的3D卷积网络的可扩展培训
2. Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning [J] . E Wes Bethel, Mark Howison International Journal of High Performance Computing Applications . 2012,第4期

机译：多核和多核共享内存并行光线投射体积渲染优化和调整
3. Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines [J] . Bin Gu, Zhouyuan Huo, Cheng Deng, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：共享存储机器的更快的无导数随机算法
4. ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines [C] . Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung IEEE International Parallel and Distributed Processing Symposium . 2016

机译：ZNN-一种用于在多核和多核共享存储机器上训练3D卷积网络的快速可扩展算法
5. On the design and implementation of parallel algorithms for graph problems on shared-memory machines. [D] . Cong, Guojing. 2004

机译：关于共享内存机器上图问题的并行算法的设计和实现。
6. Fast Simple and Accurate Handwritten Digit Classification by Training Shallow Neural Network Classifiers with the ‘Extreme Learning Machine’ Algorithm [O] . Mark D. McDonnell, Migel D. Tissera, Tony Vladusich, -1

机译：通过使用极限学习机算法训练浅层神经网络分类器实现快速简单和准确的手写数字分类
7. ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines [O] . Zlateski, Aleksandar, Lee, Kisuk, Seung, H. Sebastian 2015

机译：ZNN - 一种快速可扩展的三维卷积训练算法多核和多核共享内存机器上的网络

ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines

摘要

著录项

相似文献

相关主题

期刊订阅