首页> 外文会议>International Conference on Document Analysis and Recognition >On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive
【24h】

On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

机译:日本历史草书单行字符串识别的改进

获取原文

摘要

Transcribing historical Japanese document is the first step to preserve them as cultural assets. These historical documents can be directly useful not only for disaster prevention but also for enriching Japanese culture. Indeed, there is even an ongoing national project for more than 100 years with the aim of comprehensively compiling old documents. However, it is difficult to read Japanese historical cursive even for modern Japanese without training. In this paper, we report on research to recognize a single-line text of Japanese historical cursive. Our result has more than 95% accuracy for the text consisting of only 46 Hiragana characters, while 84.08% accuracy for the text including thousands of Kanji characters. Both of them are state-of-the-art accuracy. That is, our result on 46 Hiragana characters significantly outperformed the previous state-of-the-art. And this is the first research to recognize thousands of cursive Kanji characters. Furthermore, we had various experiments to improve the recognition accuracy of historical cursive, which includes data augmentation for rare characters, enhancement by language model, and fine-tuning with samples written by the same author as the test data. As a result, because of various handwriting styles, it is practically effective to fine-tune with samples written by the same author as the test data. It easily outperforms the improvements by data augmentation and language model.
机译:抄写日本历史文献是将其保留为文化资产的第一步。这些历史文献不仅可以直接用于防灾,而且可以用于丰富日本文化。实际上,甚至有一个正在进行的国家项目已经有100多年的历史了,目的是全面编辑旧文件。但是,即使是没有训练的现代日语,也很难读懂日本的历史草书。在本文中,我们报告了有关识别日本历史草书的单行文本的研究报告。对于仅包含46个平假名字符的文本,我们的结果具有95%以上的准确性,而对于包含数千个汉字字符的文本,其结果具有84.08%的准确性。两者都是最先进的精度。也就是说,我们在46个平假名角色上的结果大大优于以前的最新水平。这是第一个识别成千上万草书汉字字符的研究。此外,我们进行了各种实验来提高历史草书的识别准确性,其中包括对稀有字符进行数据增强,通过语言模型进行增强以及使用由同一作者编写的样本作为测试数据进行微调。结果,由于各种笔迹样式,将同一作者编写的样本作为测试数据进行微调实际上是有效的。通过数据扩充和语言模型,它很容易胜过改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号