Memory-Efficient Models for Scene Text Recognition via Neural Architecture Search

机译：通过神经体系结构搜索实现场景文本识别的内存有效模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Meta-learning techniques based on neural architecture search (NAS) show excellent performance in the design of learning models used in deep neural networks. In particular, when NAS is applied to design a convolutional neural network (CNN) for image recognition, the performance of the network when evaluating public benchmark datasets such as CIFAR10 and ImageNet exceeds that of hand-designed models. Nevertheless, there are very few cases wherein NAS has been applied to real-world problems, i.e. recognition problems with a limited dataset. We proposed a method in which the NAS technique does not require a proxy task for the scene text recognition (STR) framework to apply the NAS method to a new image recognition field. Therefore, we proposed an architecture space for CNN-based modules in the STR framework and applied the ProxylessNAS method, enabling end-to-end training while meta learners design a new model that requires only a single commonly used GPU (approximately 100 GPU hours). To evaluate the STR model obtained by the proposed NAS method, seven STR benchmark datasets were used. Finally, the obtained model could achieve a performance similar to that of the ideal model in terms of accuracy and number of parameters. We thus confirm that the model design based on NAS can be effectively applied to STR scenarios.

机译：基于神经体系结构搜索（NAS）的元学习技术在深度神经网络中使用的学习模型的设计中显示出出色的性能。特别是，当将NAS用于设计用于图像识别的卷积神经网络（CNN）时，评估公共基准数据集（如CIFAR10和ImageNet）时网络的性能超过了手工设计的模型。然而，在极少数情况下，NAS已被应用于实际问题，即数据集有限的识别问题。我们提出了一种方法，其中NAS技术不需要场景文本识别（STR）框架的代理任务即可将NAS方法应用于新的图像识别领域。因此，我们在STR框架中为基于CNN的模块提出了一个架构空间，并应用了ProxylessNAS方法，从而实现了端到端培训，而元学习者设计了一个仅需一个常用GPU（大约100个GPU小时）的新模型。。为了评估通过建议的NAS方法获得的STR模型，使用了七个STR基准数据集。最后，在精度和参数数量方面，所获得的模型可以实现与理想模型相似的性能。因此，我们确认基于NAS的模型设计可以有效地应用于STR场景。

著录项

来源
《IEEE Winter Applications of Computer Vision Workshops》|2020年|183-191|共9页
会议地点
作者
SeulGi Hong; DongHyun Kim; Min-Kook Choi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Computer architecture; Microprocessors; Training; Task analysis; Feature extraction; Computational modeling; Graphics processing units;

机译：计算机体系结构;微处理器;培训;任务分析;特征提取;计算建模;图形处理单元;

相似文献

外文文献
中文文献
专利

1. Deep neural network with attention model for scene text recognition [J] . Shuohao Li, Min Tang, Qiang Guo, Computer Vision, IET . 2017,第7期

机译：具有注意力模型的深度神经网络用于场景文本识别
2. Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition [J] . Fenglei Wang, Qiang Guo, Jun Lei, Computer Vision, IET . 2017,第6期

机译：具有隐马尔可夫模型自举的卷积递归神经网络用于场景文本识别
3. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition [J] . Baoguang Shi, Xiang Bai, Cong Yao IEEE Transactions on Pattern Analysis and Machine Intelligence . 2017,第11期

机译：基于端到端的可训练神经网络基于图像的序列识别及其在场景文本识别中的应用
4. Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising [C] . Haokui Zhang, Ying Li, Hao Chen, IEEE/CVF Conference on Computer Vision and Pattern Recognition . 2020

机译：内存有效的分层神经体系结构图像降噪搜索
5. A neural model of scene understanding: Multiple-scale spatial and feature-based attention in scene search, learning, and recognition. [D] . Huang, Tsung-Ren. 2010

机译：场景理解的神经模型：场景搜索，学习和识别中多尺度基于空间和基于特征的注意力。
6. Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images [O] . Asghar Ali Chandio, Md. Asikuzzaman, Mark Pickering, 2020

机译：草书文本：用于自然场景图像中端到端乌尔都语文本识别的综合数据集
7. Memory-Efficient Hierarchical Neural Architecture Search for Image Denoising [O] . Haokui Zhang, Ying Li, Hao Chen, 2020

机译：记忆有效的分层神经架构搜索图像去噪

Memory-Efficient Models for Scene Text Recognition via Neural Architecture Search

摘要

著录项

相似文献

相关主题

期刊订阅