首页> 外文期刊>Literary & linguistic computing >Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription
【24h】

Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription

机译:抄写17世纪的植物手稿:对文档布局检测和交互式转录的纵向评估

获取原文
获取原文并翻译 | 示例
           

摘要

We present a process for cost-effective transcription of cursive handwritten text images that has been tested on a 1,000-page 17th-century book about botanical species. The process comprised two main tasks, namely: (1) preprocessing: page layout analysis, text line detection, and extraction; and (2) transcription of the extracted text line images. Both tasks were carried out with semiautomatic procedures, aimed at incrementally minimizing user correction effort, by means of computer-assisted line detection and interactive handwritten text recognition technologies. The contribution derived from this work is three-fold. First, we provide a detailed human-supervised transcription of a relatively large historical handwritten book, ready to be searchable, indexable, and accessible to cultural heritage scholars as well as the general public. Second, we have conducted the first longitudinal study to date on interactive handwriting text recognition, for which we provide a very comprehensive user assessment of the real-world performance of the technologies involved in this work. Third, as a result of this process, we have produced a detailed transcription and document layout information (i.e. high-quality labeled data) ready to be used by researchers working on automated technologies for document analysis and recognition.
机译:我们提出了一种具有成本效益的草书手写文本图像的转录方法,该方法已在17世纪的1,000页有关植物物种的书上进行了测试。该过程包括两个主要任务,即:(1)预处理:页面布局分析,文本行检测和提取; (2)提取文本行图像的转录。两项任务都是通过半自动程序执行的,旨在通过计算机辅助的行检测和交互式手写文本识别技术来逐步减少用户的校正工作。这项工作的贡献是三方面的。首先,我们提供了人类监督下的一本相对较大的历史手写书的详细抄写本,以供文化遗产学者以及广大公众使用。其次,迄今为止,我们已经进行了有关交互式手写文本识别的首次纵向研究,为此,我们对这项工作中涉及的技术的真实性能进行了非常全面的用户评估。第三,作为此过程的结果,我们产生了详细的转录和文档布局信息(即高质量的标记数据),供从事自动技术进行文档分析和识别的研究人员使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号