Understanding Generalization and Optimization Performance of Deep CNNs

Pan Zhou; Jiashi Feng

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Understanding Generalization and Optimization Performance of Deep CNNs

【24h】

Understanding Generalization and Optimization Performance of Deep CNNs

机译：了解深度CNN的泛化和优化性能

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of $l$ convolutional layers and one fully connected layer, we prove that its generalization error is bounded by $mathcal{O}(sqrt{hetawidetilde{arrho}})$ where $heta$ denotes freedom degree of the network parameters and $widetilde{arrho}=mathcal{O}(log(prod_{i=1}^{l}b_{i} (k_{i}-s_{i}+1)/p)+log(b_{l+1}))$ encapsulates architecture parameters including the kernel size $k_{i}$, stride $s_{i}$, pooling size $p$ and parameter magnitude $b_{i}$. To our best knowledge, this is the first generalization bound that only depends on $mathcal{O}(log(prod_{i=1}^{l+1}b_{i}))$, tighter than existing ones that all involve an exponential term like $mathcal{O}(prod_{i=1}^{l+1}b_{i})$. Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring that the optimized CNN model well generalizes to new data.

机译：这项工作旨在通过理论上分析深层卷积神经网络的泛化性能并为基于梯度下降的训练算法建立优化保证，来提供对深层卷积神经网络（CNN）巨大成功的理解。具体来说，对于由$ l $卷积层和一个完全连接层组成的CNN模型，我们证明了其泛化误差受$ mathcal {O}（ sqrt { theta widetilde { varrho} / n}）限制$其中$ theta $表示网络参数的自由度，而$ widetilde { varrho} = mathcal {O}（ log（ prod_ {i = 1} ^ {l} b_ {i}（k_ {i } -s_ {i} +1）/ p）+ log（b_ {l + 1}））$封装了架构参数，包括内核大小$ k_ {i} $，步幅$ s_ {i} $，合并大小$ p $和参数幅度$ b_ {i} $。据我们所知，这是仅依赖于$ mathcal {O}（ log（ prod_ {i = 1} ^ {l + 1} b_ {i}））$的第一个泛化约束，比现有约束更严格全部都涉及一个指数项，例如$ mathcal {O}（ prod_ {i = 1} ^ {l + 1} b_ {i}）$。此外，我们证明了对于任意梯度下降算法，通过最小化经验风险计算出的近似平稳点也是人口风险的近似平稳点。这很好地解释了为什么梯度下降训练算法通常在实践中表现良好。此外，我们证明了经验风险和总体风险之间非退化平稳点的一对一对应性和收敛性保证。这意味着计算得出的经验风险的局部最小值也接近于人口风险的局部最小值，从而确保了优化的CNN模型能够很好地推广到新数据。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第12期|共10页
作者
Pan Zhou; Jiashi Feng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Understanding Generalization and Optimization Performance of Deep CNNs [J] . Pan Zhou, Jiashi Feng JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：了解深度CNN的泛化和优化性能
2. Real-time COVID-19 diagnosis from X-Ray images using deep CNN and extreme learning machines stabilized by chimp optimization algorithm [J] . Hu Tianqing, Khishe Mohammad, Mohammadi Mokhtar, Biomedical signal processing and control . 2021,第Pta2期

机译：利用深CNN和极端学习机通过黑猩猩优化算法稳定的X射线图像实时Covid-19诊断
3. An optimization strategy for HMI panel recognition of CNC machines using a CNN deep-learning network [J] . Bo Guo, Fu-Shin Lee, Chen-I Lin, Concurrent engineering: research and applications . 2021,第1期

机译：使用CNN深度学习网络的CNC机器HMI面板识别优化策略
4. Understanding Generalization and Optimization Performance of Deep CNNs [C] . Pan Zhou, Jiashi Feng International Conference on Machine Learning . 2018

机译：了解深CNN的泛化和优化性能
5. High Performance Computing Applications: Inter-Process Communication, Workflow Optimization, and Deep Learning for Computational Nuclear Physics [D] . Negoita, Gianina Alina. 2018

机译：高性能计算应用程序：用于计算核物理的进程间通信，工作流优化和深度学习
6. AdaptAhead Optimization Algorithm for Learning Deep CNN Applied to MRI Segmentation [O] . Farnaz Hoseini, Asadollah Shahbahrami, Peyman Bayat 2019

机译：学习深度CNN在MRI分割中的AdaptAhead优化算法
7. Improving binary skin cancer classification based on best model selection method combined with optimizing full connected layers of Deep CNN [O] . Tri Cong Pham, Van Dung Hoang, Cong Thanh Tran, 2020

机译：基于最佳模型选择方法改善二元皮肤癌症分类，结合优化深层CNN的全连接层

Understanding Generalization and Optimization Performance of Deep CNNs

摘要

著录项

相似文献

相关主题

期刊订阅