首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images
【24h】

Weakly Supervised Text Attention Network for Generating Text Proposals in Scene Images

机译:弱监督文本注意网络,用于在场景图像中生成文本建议

获取原文

摘要

Detection and recognition of textual information in scene images is useful but challenging tasks. Numerous methods have been proposed to solve the problem. Recently the best results are attained by deep neural network based methods. Training such networks needs large amounts of bounding box-level or pixel-level annotated data. Generating large amounts of such data always requires huge amounts of labor which can be expensive and time consuming. In this paper we explore the utilization of weakly supervised deep neural network for generating text proposals in natural scene images. The network allows multi-scale inputs and is trained to perform whole image binary classification to tell whether an image contains text or not. After training the network acquired learning of powerful discriminated features that are capable of distinguishing text from other objects. To get the text location, text confidence score map is generated based on feature maps from the top two convolutional layers by extracting class activation map. Value of each pixel in the score map denotes the confidence score of whether the pixel belongs to text or not. By setting a threshold the score map is converted to a binary mask map. Foregrounds of the mask map are probable text areas. Then Maximally Stable Extremal Regions (MSERs) are extracted from these probable text areas and are aggregated as groups. By processing these groups, text proposals are obtained. Experimental results show that without using any bounding boxes or pixel-level annotation, the algorithm achieves recall rate comparable to some fully supervised methods in ICDAR 2013 focused text dataset and In ICDAR 2015 incidental text dataset.
机译:在场景图像中检测和识别文本信息是有用的,但具有挑战性的任务。已经提出了许多方法来解决该问题。最近,通过基于深度神经网络的方法获得了最佳结果。训练此类网络需要大量的边界框级别或像素级别的批注数据。产生大量这样的数据总是需要大量的劳动,这可能是昂贵和费时的。在本文中,我们探索了利用弱监督的深度神经网络在自然场景图像中生成文本提议的方法。该网络允许多尺度输入,并经过训练可以执行整个图像二进制分类,以判断图像是否包含文本。训练后,网络学习了强大的可分辨功能,这些功能可以将文本与其他对象区分开。为了获得文本位置,通过提取类激活图,根据来自顶部两个卷积层的特征图生成文本置信度得分图。得分图中的每个像素的值表示该像素是否属于文本的置信度得分。通过设置阈值,得分图将转换为二进制掩码图。蒙版贴图的前景是可能的文本区域。然后从这些可能的文本区域中提取最大稳定的极值区域(MSER),并将其汇总为组。通过处理这些组,可以获得文本建议。实验结果表明,在不使用任何边界框或像素级注释的情况下,该算法的召回率可与ICDAR 2013重点文本数据集和ICDAR 2015附带文本数据集中的某些完全受监督的方法相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号