首页> 外文期刊>International Journal on Document Analysis and Recognition >Character segmentation and transcription system for historical Japanese books with a self-proliferating character image database
【24h】

Character segmentation and transcription system for historical Japanese books with a self-proliferating character image database

机译:具有自增字符图像数据库的日语历史书籍字符分割和转录系统

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes an interactive system for assisting transcription work for digitized historical woodblock-printed Japanese books published in the seventeenth to nineteenth centuries. The main functions of the system include layout analysis, character segmentation, transcription, and the generation of a character image database. The procedures for using the system consist of two major phases. In the first phase, the system automatically produces provisional character segmentation data, and users interactively edit the segmentation results and transcribe them into text data. Information obtained in this phase is stored in the character image database. In the second phase, the system performs automatic character segmentation and transcription by using the database generated in the first phase. Through repeated applications of these two phases to a variety of materials, the contents of the character image database will be enhanced, and the system's performance in character segmentation and transcription will increase accordingly. Since the scheme looks like the fact that the parents produce their children and the children produce grandchildren and so on, successively, this database is called as self-proliferating database. The experiment showed that when the number of character images in the database increased, the transcription accuracy also increased accordingly. In the experiment, when the size of the database increased to 37,000, the segmentation accuracy reached 83.7%, whereas the transcription accuracy reached 69.1%.
机译:本文介绍了一种交互式系统,该系统可协助17世纪至19世纪出版的数字化历史木刻版日本书的转录工作。该系统的主要功能包括布局分析,字符分割,转录和字符图像数据库的生成。使用该系统的过程包括两个主要阶段。在第一阶段,系统自动生成临时字符分割数据,并且用户以交互方式编辑分割结果并将其转录为文本数据。在该阶段获得的信息存储在字符图像数据库中。在第二阶段,系统使用第一阶段中生成的数据库执行自动字符分割和转录。通过将这两个阶段重复应用到各种材料上,字符图像数据库的内容将得到增强,并且系统在字符分割和转录方面的性能也会相应提高。由于该方案看起来像是父母生育孩子而子女生育孙子的事实,因此该数据库被称为自我扩散数据库。实验表明,当数据库中的字符图像数量增加时,转录精度也相应提高。在实验中,当数据库的大小增加到37,000时,分割精度达到83.7%,而转录精度达到69.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号