首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition
【24h】

An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition

机译:基于端到端的可训练神经网络基于图像的序列识别及其在场景文本识别中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.
机译:基于图像的序列识别已成为计算机视觉领域的长期研究课题。在本文中,我们研究了场景文本识别问题,这是基于图像的序列识别中最重要和最具挑战性的任务之一。提出了一种新颖的神经网络架构,它将特征提取,序列建模和转录集成到一个统一的框架中。与以前的场景文本识别系统相比,该体系结构具有四个独特的特性:(1)与大多数现有的算法(其组件分别经过训练和调整)相比,它是端对端可训练的。 (2)它自然地处理任意长度的序列,不涉及字符分割或水平尺度归一化。 (3)它不限于任何预定义的词典,并且在无词典和基于词典的场景文本识别任务中均表现出色。 (4)生成有效但小得多的模型,这对于实际的应用程序场景更为实用。在包括IIIT-5K,街景文字和ICDAR数据集在内的标准基准上进行的实验证明了该算法优于现有技术的优势。此外,该算法在基于图像的乐谱识别任务中表现良好,显然证明了其通用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号