首页> 外文会议>Future Technologies Conference >From Textline to Paragraph: A Promising Practice for Chinese Text Recognition
【24h】

From Textline to Paragraph: A Promising Practice for Chinese Text Recognition

机译:从TextLine到段落:中国文本认可的有希望的惯例

获取原文

摘要

Although handwritten Chinese text recognition (HCTR) has achieved tremendous progress in the past decades, the traditional document analysis system suffers from two main problems: (1) The annotation of position and transcript at line level is costly to obtain; (2) The framework consists of several separately trained modules, and it's difficult for the complex system to get satisfying results. Therefore, handwritten paragraph recognition attempts to incorporate the textline segmentation and recognition into a complete network. However, large character set and great insufficient training samples make it troublesome for handwritten Chinese paragraph recognition (HCPR). In this paper, a novel framework is proposed for HCPR. To make the training process faster and more stable, we put forward the Multi-Dimensional LSTM Convolutional Attention (MLCA) recognition framework. A new writing-style-aware image synthesis method is utilized as well to overcome the problem of data insufficiency. We conduct several experiments on the ICDAR-2013 competition dataset and the corresponding corrupted dataset. From the compelling results, we can draw an encouraging conclusion that it would be a promising trend to move from HCTR to HCPR for Chinese document analysis system.
机译:虽然在过去的几十年中,手写中国文本认可(HCTR)取得了巨大的进展,但传统的文件分析系统遭受了两个主要问题:(1)衡量线级的批量和成绩单是昂贵的; (2)该框架由几个单独培训的模块组成,复杂系统难以满足令人满意的结果。因此,手写段落识别尝试将TextLine分段和识别纳入完整的网络。但是,大角色集和巨大的训练样本都使其对手写的中文段落识别(HCPR)进行麻烦。本文提出了一种用于HCPR的新颖框架。为了使培训过程更快,更稳定,我们提出了多维LSTM卷积注意力(MLCA)识别框架。利用新的写作风格感知图像合成方法,也可以克服数据不足的问题。我们在ICDAR-2013竞争数据集和相应损坏的数据集中进行若干实验。从引人注目的结果,我们可以吸引一个令人鼓舞的结论,即从HCTR到中国文档分析系统的HCTR将是一个有希望的趋势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号