首页> 外文会议>International Conference on Pattern Recognition >Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning
【24h】

Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning

机译:通过纯端到端学习识别来自图像的多个文本序列

获取原文

摘要

We address a challenging problem: recognizing multiple text sequences from an image by pure end-to-end learning. It is twofold: 1) Multiple text sequences recognition. Each image may contain multiple text sequences of different content, location and orientation, we try to recognize all these texts in the image. 2) Pure end-to-end (PEE) learning. We solve the problem in a pure end-to-end learning way where each training image is labeled by only text transcripts of the contained sequences, without any geometric annotations. Most existing works recognize multiple text sequences from an image in a non-end-to-end (NEE) or quasi-end-to-end (QEE) way, in which each image is trained with both text transcripts and text locations. Only recently, a PEE method was proposed to recognize text sequences from an image where the text sequence was split to several lines in the image. However, it cannot be directly applied to recognizing multiple text sequences from an image. So in this paper, we propose a pure end-to-end learning method to recognize multiple text sequences from an image. Our method directly learns the probability distribution of multiple sequences conditioned on each input image, and outputs multiple text transcripts with a well-designed decoding strategy. To evaluate the proposed method, we construct several datasets mainly based on an existing public dataset and two real application scenarios. Experimental results show that the proposed method can effectively recognize multiple text sequences from images, and outperforms CTC-based and attention-based baseline methods.
机译:我们解决了一个具有挑战性的问题:通过纯粹的端到端学习识别来自图像的多个文本序列。它是双重的:1)多个文本序列识别。每个图像可能包含不同内容,位置和方向的多个文本序列,我们尝试识别图像中的所有这些文本。 2)纯端到端(小便)学习。我们以纯端到端学习方式解决问题,其中每个训练图像仅由包含的序列的文本转录物标记,而没有任何几何注释。大多数现有工作识别来自非端到端(NEE)或准端到端(QEE)方式中的图像的多个文本序列,其中每个图像都用文本转录物和文本位置训练。仅近来,提出了一种小便方法来识别来自文本序列被分割到图像中的几行的图像中的文本序列。但是,它不能直接应用于识别来自图像的多个文本序列。因此,在本文中,我们提出了一种纯粹的端到端学习方法来识别来自图像的多个文本序列。我们的方法直接学习在每个输入图像上调节的多个序列的概率分布,并输出具有精心设计的解码策略的多个文本转录程序。为了评估所提出的方法,我们主要基于现有的公共数据集和两个实际应用方案构建多个数据集。实验结果表明,该方法可以有效地识别来自图像的多个文本序列,并且优于基于CTC和基于关注的基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号