首页> 外文期刊>Pattern recognition letters >Text alignment in early printed books combining deep learning and dynamic programming
【24h】

Text alignment in early printed books combining deep learning and dynamic programming

机译:早期印刷书中的文本对齐结合了深度学习和动态规划

获取原文
获取原文并翻译 | 示例
           

摘要

We describe a technique for transcript alignment in early printed books by using deep models in combination with dynamic programming algorithms. Two object detection models, based on Faster R-CNN, are trained to locate words. We first train an initial model to recognize generic words and hyphens by using information about the number of words in text lines. Using the model prediction on pages with a line-by-line ground-truth annotation is available, we train a second model able to detect landmark words. The alignment is then based on the identification of landmark words in pages where we only know the text corresponding to zones in the page. The proposed technique is evaluated on a publicly available digitization of the Gutenberg Bible while the transcription is based on the Vulgata, a late 4th century Latin translation of the Bible. (C) 2020 Elsevier B.V. All rights reserved.
机译:通过使用深度模型与动态编程算法结合使用深度模型,我们描述了一种用于早期印刷书籍的转录对准技术。两个对象检测模型基于更快的R-CNN,训练以定位单词。我们首先通过使用关于文本行中的单词数量的信息来训练初始模型来识别通用单词和连字符。使用与逐行地面实际注释的页面上的模型预测可用,我们训练第二个模型能够检测地标单词。然后,对齐基于在页面中的标志标记的识别,我们只知道与页面中的区域对应的文本。在转录基于vutenberg圣经的公开可用数字化上,评估了所提出的技术,而转录是基于vutgata,这是圣经的4世纪晚期的拉丁语翻译。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号