Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images

机译：弱监督文本注意网络，用于在场景图像中生成文本建议

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Detection and recognition of textual information in scene images is useful but challenging tasks. Numerous methods have been proposed to solve the problem. Recently the best results are attained by deep neural network based methods. Training such networks needs large amounts of bounding box-level or pixel-level annotated data. Generating large amounts of such data always requires huge amounts of labor which can be expensive and time consuming. In this paper we explore the utilization of weakly supervised deep neural network for generating text proposals in natural scene images. The network allows multi-scale inputs and is trained to perform whole image binary classification to tell whether an image contains text or not. After training the network acquired learning of powerful discriminated features that are capable of distinguishing text from other objects. To get the text location, text confidence score map is generated based on feature maps from the top two convolutional layers by extracting class activation map. Value of each pixel in the score map denotes the confidence score of whether the pixel belongs to text or not. By setting a threshold the score map is converted to a binary mask map. Foregrounds of the mask map are probable text areas. Then Maximally Stable Extremal Regions (MSERs) are extracted from these probable text areas and are aggregated as groups. By processing these groups, text proposals are obtained. Experimental results show that without using any bounding boxes or pixel-level annotation, the algorithm achieves recall rate comparable to some fully supervised methods in ICDAR 2013 focused text dataset and In ICDAR 2015 incidental text dataset.

机译：在场景图像中检测和识别文本信息是有用的，但具有挑战性的任务。已经提出了许多方法来解决该问题。最近，通过基于深度神经网络的方法获得了最佳结果。训练此类网络需要大量的边界框级别或像素级别的批注数据。产生大量这样的数据总是需要大量的劳动，这可能是昂贵和费时的。在本文中，我们探索了利用弱监督的深度神经网络在自然场景图像中生成文本提议的方法。该网络允许多尺度输入，并经过训练可以执行整个图像二进制分类，以判断图像是否包含文本。训练后，网络学习了强大的可分辨功能，这些功能可以将文本与其他对象区分开。为了获得文本位置，通过提取类激活图，根据来自顶部两个卷积层的特征图生成文本置信度得分图。得分图中的每个像素的值表示该像素是否属于文本的置信度得分。通过设置阈值，得分图将转换为二进制掩码图。蒙版贴图的前景是可能的文本区域。然后从这些可能的文本区域中提取最大稳定的极值区域（MSER），并将其汇总为组。通过处理这些组，可以获得文本建议。实验结果表明，在不使用任何边界框或像素级注释的情况下，该算法的召回率可与ICDAR 2013重点文本数据集和ICDAR 2015附带文本数据集中的某些完全受监督的方法相比。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|324-330|共7页
会议地点
作者
Li Rong; En MengYi; Li JianQiang; Zhang HaiBin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Proposals; Feature extraction; Object detection; Neural networks; Text recognition; Classification algorithms; Training;

机译：提案;特征提取;目标检测;神经网络;文本识别;分类算法;训练;

相似文献

外文文献
中文文献
专利

1. Weak supervision for generating pixel-level annotations in scene text segmentation [J] . Bonechi Simone, Bianchini Monica, Scarselli Franco, Pattern recognition letters . 2020,第Octa期

机译：在场景文本分段中生成像素级注释的弱监督
2. Natural scene text detection based on multiscale connectionist text proposal network [J] . Huang Min, Lan Chaohao, Huang Wei, . 2020,第13期

机译：基于多尺度连接主义文本提案网络的自然场景文本检测
3. Text-Attentional Convolutional Neural Network for Scene Text Detection [J] . T. He, W. Huang, Y. Qiao, IEEE Transactions on Image Processing . 2016,第6期

机译：文本注意卷积神经网络的场景文本检测
4. Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images [C] . Li Rong, En MengYi, Li JianQiang, IAPR International Conference on Document Analysis and Recognition . 2017

机译：用于在现场图像中生成文本提案的弱局部注意文本关注网络
5. Fuzzification of Supervised and Semi-Supervised Convolution Neural Networks for Identification of Neutral Text in Sentiment Analysis [D] . ?Najar, Rawan 2020

机译：监督和半监控卷积神经网络的鉴定，用于识别中立文本的情感分析
6. An Algorithm Based on Text Position Correction and Encoder-Decoder Network for Text Recognition in the Scene Image of Visual Sensors [O] . Zhiwei Huang, Jinzhao Lin, Hongzhi Yang, 2020

机译：基于文本位置校正和编解码器网络的视觉传感器场景图像文本识别算法
7. Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning [O] . Xugong Qin, Yu Zhou, Dongbao Yang, 2019

机译：曲线文本检测在自然场景图像中，半和虚弱的学习

Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images

摘要

著录项

相似文献

相关主题

期刊订阅