首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition
【24h】

Self-Training of BLSTM with Lexicon Verification for Handwriting Recognition

机译:BLSTM的自我训练与语法识别的词典验证

获取原文

摘要

Deep learning approaches now provide state-of-the-art performance in many computer vision tasks such as handwriting recognition. However, the huge number of parameters of these models require big annotated training datasets which are difficult to obtain. Training neural networks with unlabeled data is one of the key problems to achieve significant progress in deep learning. In this article, we explore a new semi-supervised training strategy to train long-short term memory (LSTM) recurrent neural networks for isolated handwritten words recognition. The idea of our self-training strategy relies on the iteration of training Bidirectional LSTM recurrent neural network (BLSTM) using both labeled and unlabeled data. At each iteration the current trained network labels the unlabeled data and submit them to a very efficient "lexicon verification" rule. Verified unlabeled data are added to the labeled dataset at the end of each iteration. This verification stage has very low sensitivity to the lexicon size, and a full word coverage of the dataset is not necessary to make the semi-supervised method efficient. The strategy enables self-training with a single BLSTM and show promising results on the Rimes dataset.
机译:深度学习方法现在为许多计算机视觉任务提供最先进的性能,例如手写识别。然而,这些模型的大量参数需要大的注释训练数据集,这很难获得。培训具有未标记数据的神经网络是在深度学习中实现重大进展的关键问题之一。在本文中,我们探讨了培养了用于隔离手写单词识别的长短短期记忆(LSTM)经常性神经网络的新的半监督培训策略。我们的自我培训策略的想法依赖于使用标记和未标记的数据训练双向LSTM经常性神经网络(BLSTM)的迭代。在每次迭代时,当前训练的网络标明未标记的数据并将其提交给一个非常有效的“词典验证”规则。验证的未标记数据将在每个迭代结束时添加到标记的数据集。该验证阶段对词典大小非常低,并且数据集的完整单词覆盖率是没有必要的,以使半导体方法有效。该策略使单一BLSTM的自我训练能够在rimes数据集中显示有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号