Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation

Zhaodong Chen; Lei Deng; Guoqi Li; Jiawei Sun; Xing Hu; Ling Liang; Yufei Ding; Yuan Xie

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation

【24h】

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation

机译：使用一些不相关的统计数据估计数据有效和高效的批量标准化

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep neural networks (DNNs) thrive in recent years, wherein batch normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the huge reduction and elementwise operations that are hard to be executed in parallel, which heavily reduces the training speed. To address this issue, in this article, we propose a methodology to alleviate the BN’s cost by using only a few sampled or generated data for mean and variance estimation at each iteration. The key challenge to reach this goal is how to achieve a satisfactory balance between normalization effectiveness and execution efficiency. We identify that the effectiveness expects less data correlation in sampling while the efficiency expects more regular execution patterns. To this end, we design two categories of approach: sampling or creating a few uncorrelated data for statistics’ estimation with certain strategy constraints. The former includes “batch sampling (BS)” that randomly selects a few samples from each batch and “feature sampling (FS)” that randomly selects a small patch from each feature map of all samples, and the latter is “virtual data set normalization (VDN)” that generates a few synthetic random samples to directly create uncorrelated data for statistics’ estimation. Accordingly, multiway strategies are designed to reduce the data correlation for accurate estimation and optimize the execution pattern for running acceleration in the meantime. The proposed methods are comprehensively evaluated on various DNN models, where the loss of model accuracy and the convergence rate are negligible. Without the support of any specialized libraries,

$1.98imes $

BN layer acceleration and 23.2% overall training speedup can be practically achieved on modern GPUs. Furthermore, our methods demonstrate powerful performance when solving the well-known “micro-BN” problem in the case of a tiny batch size. This article provides a promising solution for the efficient training of high-performance DNNs.

机译：近年来，深度神经网络（DNN）茁壮成长，其中批量归一化（BN）起不可或缺的作用。然而，已经观察到，由于巨大的<斜体XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http：//www.w3，BN是昂贵的。 ORG / 1999 / XLINK“>减少和<斜体XMLNS：MML =”http://www.w3.org/1998/math/mathml“xmlns：xlink =”http://www.w3.org / 1999 / xlink“> ComponeneWise 难以并行执行的操作，这大量降低了训练速度。为了解决这个问题，在本文中，我们提出了一种方法来缓解BN的成本，仅使用每次迭代的均值和方差估计的少数采样或生成的数据。达到这一目标的关键挑战是如何在归一化效率和执行效率之间实现令人满意的平衡。我们确定效率预期对采样中的数据相关性较少，而效率则期望更规则执行模式。为此，我们设计了两类方法：采样或创建一些统计数据的若干不相关数据，具有某些策略约束。该前者包括“批量采样（BS）”，随机选择来自每个批处理的一些样本和“特征采样（FS）”，该样本从所有样本的每个特征映射中随机选择小修补程序，后者是“虚拟数据集归一化” （VDN）“产生少数合成随机样本，可直接创建统计数据的不相关数据。因此，多道策略旨在降低准确估计的数据相关性，并优化用于运行加速度的执行模式。在各种DNN模型中综合评估所提出的方法，其中模型精度损失和收敛速度可忽略不计。没有任何专门的库的支持，<内联公式XMLNS：MML =“http://www.w3.org/1998/math/mathml”xmlns：xlink =“http://www.w3.org/1999/ XLink“> $ 1.98 倍$ BN层加速度和23.2％的整体训练加速度可以实际上实现现代GPU。此外，我们的方法在解决微小批量尺寸的情况下求解众所周知的“微bn”问题时，我们的方法表现出强大的性能。本文为高性能DNN的高效培训提供了有希望的解决方案。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2021年第1期|348-362|共15页
作者
Zhaodong Chen; Lei Deng; Guoqi Li; Jiawei Sun; Xing Hu; Ling Liang; Yufei Ding; Yuan Xie;
展开▼
作者单位

Center for Brain-Inspired Computing Research Beijing Innovation Center for Future Chip Tsinghua University Beijing China;

University of California at Santa Barbara Santa Barbara CA USA;

Center for Brain-Inspired Computing Research Beijing Innovation Center for Future Chip Tsinghua University Beijing China;

University of California at Santa Barbara Santa Barbara CA USA;

University of California at Santa Barbara Santa Barbara CA USA;

University of California at Santa Barbara Santa Barbara CA USA;

University of California at Santa Barbara Santa Barbara CA USA;

University of California at Santa Barbara Santa Barbara CA USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Estimation; Training; Acceleration; Correlation; Libraries; Convergence; Technological innovation;

机译：估计;培训;加速;相关;图书馆;融合;技术创新;

相似文献

外文文献
中文文献
专利

1. Anomaly threshold estimation and data normalization using EDA statistics: application to lithogeochemical exploration in lower Cretaceous Zn-Pb carbonate-hosted deposits, northern Spain [J] . Yusta I., Herrero JM., Velasco F. Applied Geochemistry: Journal of the International Association of Geochemistry and Cosmochemistry . 1998,第4期

机译：使用EDA统计数据进行异常阈值估计和数据归一化：在西班牙北部下白垩统Zn-Pb碳酸盐岩型矿床的岩石地球化学勘探中的应用
2. Determination of dose-area product from panoramic radiography using a pencil ionization chamber: normalized data for the estimation of patient effective and organ doses. [J] . Perisinakis K, Damilakis J, Neratzoulakis J, Medical Physics . 2004,第4期

机译：使用铅笔电离室通过全景射线照相确定剂量面积乘积：用于估计患者有效剂量和器官剂量的归一化数据。
3. BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams [J] . Marwan Hassani, Daniel Toews, Alfredo Cuzzocrea, International Journal of Data Science and Analytics . 2019,第3期

机译：BFSPMINER：用于在数据流上挖掘连续模式的有效和有效的无批算法
4. Efficient Batch Statistical Error Estimation for Iterative Multi-level Approximate Logic Synthesis [C] . Sanbao Su, Yi Wu, Weikang Qian 2018 55th ACM/ESDA/IEEE Design Automation Conference . 2018

机译：迭代多级近似逻辑综合的高效批处理统计误差估计
5. Efficient use of genetic data for mapping complex traits: Improved data management, significance testing for marker allele sharing statistics, and estimation of kinship coefficients. [D] . Day-Williams, Aaron Garth. 2009

机译：有效利用遗传数据绘制复杂性状：改进数据管理，标记等位基因共享统计数据的显着性测试以及亲属系数估计。
6. A Statistical Selection Strategy for Normalization Procedures in LC-MS Proteomics Experiments through Dataset Dependent Ranking of Normalization Scaling Factors [O] . Bobbie-Jo M. Webb-Robertson, Melissa M. Matzke, Jon M. Jacobs, -1

机译：通过DataSet依赖性排名对LC-MS蛋白质组学实验中的统治过程中的统计选择策略
7. BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams [O] . Marwan Hassani, Daniel Töws, Alfredo Cuzzocrea, 2017

机译：BFSPMINER：用于在数据流上挖掘连续模式的有效和有效的无批算法

Effective and Efficient Batch Normalization Using a Few Uncorrelated Data for Statistics Estimation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅