首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition
【24h】

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

机译:BLSTM的自训练与词汇验证,用于手写识别

获取原文

摘要

Deep learning approaches now provide state-of-the-art performance in many computer vision tasks such as handwriting recognition. However, the huge number of parameters of these models require big annotated training datasets which are difficult to obtain. Training neural networks with unlabeled data is one of the key problems to achieve significant progress in deep learning. In this article, we explore a new semi-supervised training strategy to train long-short term memory (LSTM) recurrent neural networks for isolated handwritten words recognition. The idea of our self-training strategy relies on the iteration of training Bidirectional LSTM recurrent neural network (BLSTM) using both labeled and unlabeled data. At each iteration the current trained network labels the unlabeled data and submit them to a very efficient "lexicon verification" rule. Verified unlabeled data are added to the labeled dataset at the end of each iteration. This verification stage has very low sensitivity to the lexicon size, and a full word coverage of the dataset is not necessary to make the semi-supervised method efficient. The strategy enables self-training with a single BLSTM and show promising results on the Rimes dataset.
机译:深度学习方法现在可以在许多计算机视觉任务(例如手写识别)中提供最先进的性能。但是,这些模型的大量参数需要使用大量带注释的训练数据集,而这些数据集很难获得。用未标记的数据训练神经网络是在深度学习中取得重大进展的关键问题之一。在本文中,我们探索了一种新的半监督训练策略,以训练用于分离的手写单词识别的长短期记忆(LSTM)递归神经网络。我们的自训练策略的思想依赖于使用标记的和未标记的数据训练双向LSTM递归神经网络(BLSTM)的迭代。在每次迭代中,当前训练有素的网络会标记未标记的数据,并将其提交给非常有效的“词典验证”规则。在每次迭代结束时,将经过验证的未标记数据添加到标记数据集中。该验证阶段对词典大小的敏感性非常低,并且对于使半监督方法有效,没有必要对数据集进行完整的单词覆盖。该策略可通过单个BLSTM进行自我训练,并在Rimes数据集上显示出令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号