首页> 外文会议>International Workshop on Document Analysis Systems >Aligning Transcripts to Automatically Segmented Handwritten Manuscripts
【24h】

Aligning Transcripts to Automatically Segmented Handwritten Manuscripts

机译:将成绩单对齐自动分段手写手稿

获取原文

摘要

Training and evaluation of techniques for handwriting recognition and retrieval is a challenge given that it is difficult to create large ground-truthed datasets. This is especially true for historical handwritten datasets. In many instances the ground truth has to be created by manually transcribing each word, which is a very labor intensive process. Sometimes transcriptions are available for some manuscripts. These transcriptions were created for other purposes and hence correspondence at the word, line, or sentence level may not be available. To be useful for training and evaluation, a word level correspondence must be available between the segmented handwritten word images and the ASCII transcriptions. Creating this correspondence or alignment is challenging because the segmentation is often errorful and the ASCII transcription may also have errors in it. Very little work has been done on the alignment of handwritten data to transcripts. Here, a novel Hidden Markov Model based automatic alignment algorithm is described and tested. The algorithm produces an average alignment accuracy of about 72.8% when aligning whole pages at a time on a set of 70 pages of the George Washington collection. This outperforms a dynamic time warping alignment algorithm by about 12% previously reported in the literature and tested on the same collection.
机译:对手写识别和检索的技术的培训和评估是一个挑战,因为难以创建大型地面判处的数据集。历史手写数据集尤其如此。在许多情况下,必须通过手动转录每个单词来创建地面真理,这是一个非常劳动的密集流程。有时转录可用于一些稿件。这些转录是为其他目的而创建的,因此可能无法使用单词,行或句子级别的对应。为了对培训和评估有用,必须在分段的手写字图像和ASCII转录之间使用字级别对应。创建此对应或对齐是具有挑战性的,因为分段通常是错误的,并且ASCII转录也可能有错误。手写数据对转录物的对齐完成了很少的工作。这里,描述和测试了一种新的隐马尔可夫模型的自动对准算法。当在乔治华盛顿集合的一套70页上对准整页时,该算法产生约72.8%的平均对准精度。这优于动态时间翘曲对准算法在文献中以前报告的约12%并在相同的收集上进行了测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号