首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Automatic Alignment of Handwritten Images and Transcripts for Training Handwritten Text Recognition Systems
【24h】

Automatic Alignment of Handwritten Images and Transcripts for Training Handwritten Text Recognition Systems

机译:用于训练手写文本识别系统的手写图像和成绩单的自动对齐

获取原文

摘要

State-of-the-art Handwritten Text Recognition techniques are based on statistical models such as hidden Markov models or recurrent neural networks for optical modeling of characters and N-grams for language modeling. These models are trained using well known, learning techniques: Expectation-Maximization, backpropagation, etc. Therefore, training data is needed to build these models. In the case of the optical models the training data consist of text line images with their corresponding transcripts. When the transcript of a handwritten document is available, putting in correspondence automatically the physical lines in the images with the lines of the transcripts is not an easy task. We present a method for automatically aligning handwritten text images and their respective transcripts. The approach automatically segments the images into lines and then recognizes them. An alignment confidence is obtained using the Levenshtein distance between the recognition results and the transcripts. The most confident lines are then used for training. Experiments carried out using a historical document present encouraging results.
机译:最先进的手写文本识别技术基于统计模型,例如隐藏的马尔可夫模型或用于语言建模的字符和n克的光学建模的经常性神经网络。这些模型使用众所周知的学习技术训练:期望最大化,BackProjagation等,因此需要培训数据来构建这些模型。在光学模型的情况下,训练数据由具有相应成绩单的文本线图像组成。当手写文档的成绩单可用时,将通信自动放入图像中的图像中的物理线路,该转录器的行不是一项简单的任务。我们介绍了一种自动对准手写文本图像及其各自的转录物的方法。该方法自动将图像分成线条,然后识别它们。使用识别结果和转录物之间的Levenshtein距离获得对准置信度。然后将最自信的线用于培训。使用历史文件进行的实验令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号