首页> 外文会议>International Conference on Document Analysis and Recognition >A Multi-task Network for Localization and Recognition of Text in Images
【24h】

A Multi-task Network for Localization and Recognition of Text in Images

机译:用于图像中文本的本地化和识别的多任务网络

获取原文

摘要

We present an end-to-end trainable multi-task network that addresses the problem of lexicon-free text extraction from complex documents. This network simultaneously solves the problems of text localization and text recognition and text segments are identified with no post-processing, cropping, or word grouping. A convolutional backbone and Feature Pyramid Network are combined to provide a shared representation that benefits each of three model heads: text localization, classification, and text recognition. To improve recognition accuracy, we describe a dynamic pooling mechanism that retains high-resolution information across all RoIs. For text recognition, we propose a convolutional mechanism with attention which out-performs more common recurrent architectures. Our model is evaluated against benchmark datasets and comparable methods and achieves high performance in challenging regimes of non-traditional OCR.
机译:我们提出了一个端到端的可训练多任务网络,该网络解决了从复杂文档中提取无词典文本的问题。该网络同时解决了文本本地化和文本识别的问题,并且无需后期处理,裁剪或单词分组即可识别文本段。卷积主干和特征金字塔网络相结合以提供共享的表示形式,这有利于三个模型头中的每一个:文本本地化,分类和文本识别。为了提高识别准确性,我们描述了一种动态池化机制,该机制可在所有RoI上保留高分辨率信息。对于文本识别,我们提出了一种具有关注度的卷积机制,其性能优于更常见的循环体系结构。我们的模型是根据基准数据集和可比较的方法进行评估的,并且在非传统OCR的挑战性条件下实现了高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号