首页> 外文期刊>IEEE Transactions on Image Processing >Text-Attentional Convolutional Neural Network for Scene Text Detection
【24h】

Text-Attentional Convolutional Neural Network for Scene Text Detection

机译:文本注意卷积神经网络的场景文本检测

获取原文
获取原文并翻译 | 示例

摘要

Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature globally computed from a whole image component (patch), where the cluttered background information may dominate true text features in the deep representation. This leads to less discriminative power and poorer robustness. In this paper, we present a new system for scene text detection by proposing a novel text-attentional convolutional neural network (Text-CNN) that particularly focuses on extracting text-related regions and features from the image components. We develop a new learning mechanism to train the Text-CNN with multi-level and rich supervised information, including text region mask, character label, and binary texton-text information. The rich supervision information enables the Text-CNN with a strong capability for discriminating ambiguous texts, and also increases its robustness against complicated background components. The training process is formulated as a multi-task learning problem, where low-level supervised information greatly facilitates the main task of texton-text classification. In addition, a powerful low-level detector called contrast-enhancement maximally stable extremal regions (MSERs) is developed, which extends the widely used MSERs by enhancing intensity contrast between text patterns and background. This allows it to detect highly challenging text patterns, resulting in a higher recall. Our approach achieved promising results on the ICDAR 2013 data set, with an F-measure of 0.82, substantially improving the state-of-the-art results.
机译:最近的深度学习模型展示了强大的功能,可以对自然图像中的文本和非文本成分进行分类。他们提取从整个图像组件(补丁)中全局计算出的高级特征,其中杂乱的背景信息可能会主导深度表示中的真实文本特征。这导致判别力降低,鲁棒性降低。在本文中,我们通过提出一种新颖的文本注意卷积神经网络(Text-CNN),提出了一种用于场景文本检测的新系统,该网络特别着重于从图像组件中提取文本相关的区域和特征。我们开发了一种新的学习机制来训练具有多层和丰富监督信息的Text-CNN,包括文本区域掩码,字符标签和二进制文本/非文本信息。丰富的监管信息使Text-CNN具有强大的辨别歧义文本的能力,还提高了其对复杂背景成分的鲁棒性。培训过程被表述为一个多任务学习问题,其中低级的监督信息极大地促进了文本/非文本分类的主要任务。此外,还开发了一种功能强大的低水平检测器,称为对比度增强最大稳定极值区域(MSER),它通过增强文本图案和背景之间的强度对比度来扩展广泛使用的MSER。这使它能够检测出极富挑战性的文本模式,从而提高召回率。我们的方法在ICDAR 2013数据集上取得了令人鼓舞的结果,F值为0.82,大大改善了最新结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号