On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

机译：论识别日本历史法学的单线串的改进

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transcribing historical Japanese document is the first step to preserve them as cultural assets. These historical documents can be directly useful not only for disaster prevention but also for enriching Japanese culture. Indeed, there is even an ongoing national project for more than 100 years with the aim of comprehensively compiling old documents. However, it is difficult to read Japanese historical cursive even for modern Japanese without training. In this paper, we report on research to recognize a single-line text of Japanese historical cursive. Our result has more than 95% accuracy for the text consisting of only 46 Hiragana characters, while 84.08% accuracy for the text including thousands of Kanji characters. Both of them are state-of-the-art accuracy. That is, our result on 46 Hiragana characters significantly outperformed the previous state-of-the-art. And this is the first research to recognize thousands of cursive Kanji characters. Furthermore, we had various experiments to improve the recognition accuracy of historical cursive, which includes data augmentation for rare characters, enhancement by language model, and fine-tuning with samples written by the same author as the test data. As a result, because of various handwriting styles, it is practically effective to fine-tune with samples written by the same author as the test data. It easily outperforms the improvements by data augmentation and language model.

机译：转录历史日本文件是将它们视为文化资产的第一步。这些历史文件不仅可以直接用于预防防灾，而且可以富集日本文化。实际上，甚至还有100多年的持续国家项目，旨在全面编制旧文件。然而，即使在没有训练的情况下，也很难阅读日本历史训练。在本文中，我们报告了识别日本历史法学的单行文本的研究报告。我们的结果具有超过95％的准确性，特别是46个平假名字符组成的文本，而文本的准确性为84.08％，包括数千个kanji字符。它们都是最先进的准确性。也就是说，我们的结果46个紫拉兰字符明显优于以前的最先进的。这是第一个识别成千上万的kanji字符的研究。此外，我们有各种实验来提高历史法学的识别准确性，包括用于稀有字符的数据增强，语言模型的增强，以及用与测试数据相同作者编写的样本的微调。因此，由于各种手写样式，它几乎是用与同一作者写入的样本作为测试数据的样本。它很容易以数据增强和语言模型更擅长改进。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|1 v.|共8页
会议地点
作者
Ayumu Nagai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
Character recognition; Text recognition; Cultural differences; Training; Deep learning; Data models; Writing;

机译：字符识别;文本识别;文化差异;培训;深入学习;数据模型;写作;

相似文献

外文文献
中文文献
专利

1. Recognizing Cursive Typewritten Text Using Segmentation-Free System [J] . Mohammad S.Khorsheed ScientificWorldJournal . 2015,第3期

机译：使用可自由分割系统识别法学型打字文本
2. On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive [C] . Ayumu Nagai International Conference on Document Analysis and Recognition . 2019

机译：日本历史草书单行字符串识别的改进
3. Historical study of the thoughts on Japanese emigrants in Japanese society (Japanese text). [D] . Tagawa, Mariko. 2004

机译：日本社会中关于日本移民思想的历史研究（日语）。
4. Recognizing Cursive Typewritten Text Using Segmentation-Free System [O] . Mohammad S. Khorsheed 2015

机译：使用无分段系统识别草书打字文本
5. COMPARISON AND IMPROVEMENT OF STRING MATCHING ALGORITHMS FOR JAPANESE TEXTS [O] . YOON Jeehee, TAKAGI Toshihisa, USHIJIMA Kazuo 1986

机译：日本文本字符串匹配算法的比较和改进

On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

摘要

著录项

相似文献

相关主题

期刊订阅