首页> 外文会议>International Conference on Document Analysis and Recognition >On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive
【24h】

On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

机译:论识别日本历史法学的单线串的改进

获取原文

摘要

Transcribing historical Japanese document is the first step to preserve them as cultural assets. These historical documents can be directly useful not only for disaster prevention but also for enriching Japanese culture. Indeed, there is even an ongoing national project for more than 100 years with the aim of comprehensively compiling old documents. However, it is difficult to read Japanese historical cursive even for modern Japanese without training. In this paper, we report on research to recognize a single-line text of Japanese historical cursive. Our result has more than 95% accuracy for the text consisting of only 46 Hiragana characters, while 84.08% accuracy for the text including thousands of Kanji characters. Both of them are state-of-the-art accuracy. That is, our result on 46 Hiragana characters significantly outperformed the previous state-of-the-art. And this is the first research to recognize thousands of cursive Kanji characters. Furthermore, we had various experiments to improve the recognition accuracy of historical cursive, which includes data augmentation for rare characters, enhancement by language model, and fine-tuning with samples written by the same author as the test data. As a result, because of various handwriting styles, it is practically effective to fine-tune with samples written by the same author as the test data. It easily outperforms the improvements by data augmentation and language model.
机译:转录历史日本文件是将它们视为文化资产的第一步。这些历史文件不仅可以直接用于预防防灾,而且可以富集日本文化。实际上,甚至还有100多年的持续国家项目,旨在全面编制旧文件。然而,即使在没有训练的情况下,也很难阅读日本历史训练。在本文中,我们报告了识别日本历史法学的单行文本的研究报告。我们的结果具有超过95%的准确性,特别是46个平假名字符组成的文本,而文本的准确性为84.08%,包括数千个kanji字符。它们都是最先进的准确性。也就是说,我们的结果46个紫拉兰字符明显优于以前的最先进的。这是第一个识别成千上万的kanji字符的研究。此外,我们有各种实验来提高历史法学的识别准确性,包括用于稀有字符的数据增强,语言模型的增强,以及用与测试数据相同作者编写的样本的微调。因此,由于各种手写样式,它几乎是用与同一作者写入的样本作为测试数据的样本。它很容易以数据增强和语言模型更擅长改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号