On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

机译：日本历史草书单行字符串识别的改进

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transcribing historical Japanese document is the first step to preserve them as cultural assets. These historical documents can be directly useful not only for disaster prevention but also for enriching Japanese culture. Indeed, there is even an ongoing national project for more than 100 years with the aim of comprehensively compiling old documents. However, it is difficult to read Japanese historical cursive even for modern Japanese without training. In this paper, we report on research to recognize a single-line text of Japanese historical cursive. Our result has more than 95% accuracy for the text consisting of only 46 Hiragana characters, while 84.08% accuracy for the text including thousands of Kanji characters. Both of them are state-of-the-art accuracy. That is, our result on 46 Hiragana characters significantly outperformed the previous state-of-the-art. And this is the first research to recognize thousands of cursive Kanji characters. Furthermore, we had various experiments to improve the recognition accuracy of historical cursive, which includes data augmentation for rare characters, enhancement by language model, and fine-tuning with samples written by the same author as the test data. As a result, because of various handwriting styles, it is practically effective to fine-tune with samples written by the same author as the test data. It easily outperforms the improvements by data augmentation and language model.

机译：抄写日本历史文献是将其保留为文化资产的第一步。这些历史文献不仅可以直接用于防灾，而且可以用于丰富日本文化。实际上，甚至有一个正在进行的国家项目已经有100多年的历史了，目的是全面编辑旧文件。但是，即使是没有训练的现代日语，也很难读懂日本的历史草书。在本文中，我们报告了有关识别日本历史草书的单行文本的研究报告。对于仅包含46个平假名字符的文本，我们的结果具有95％以上的准确性，而对于包含数千个汉字字符的文本，其结果具有84.08％的准确性。两者都是最先进的精度。也就是说，我们在46个平假名角色上的结果大大优于以前的最新水平。这是第一个识别成千上万草书汉字字符的研究。此外，我们进行了各种实验来提高历史草书的识别准确性，其中包括对稀有字符进行数据增强，通过语言模型进行增强以及使用由同一作者编写的样本作为测试数据进行微调。结果，由于各种笔迹样式，将同一作者编写的样本作为测试数据进行微调实际上是有效的。通过数据扩充和语言模型，它很容易胜过改进。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|621-628|共8页
会议地点
作者
Ayumu Nagai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Character recognition; Text recognition; Cultural differences; Training; Deep learning; Data models; Writing;

机译：字符识别;文本识别;文化差异;培训;深度学习;数据模型;写作;
入库时间 2022-08-26 14:34:51

相似文献

外文文献
中文文献
专利

1. Recognizing Cursive Typewritten Text Using Segmentation-Free System [J] . Mohammad S.Khorsheed ScientificWorldJournal . 2015,第3期

机译：使用可自由分割系统识别法学型打字文本
2. On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive [C] . Ayumu Nagai International Conference on Document Analysis and Recognition . 2019

机译：论识别日本历史法学的单线串的改进
3. Historical study of the thoughts on Japanese emigrants in Japanese society (Japanese text). [D] . Tagawa, Mariko. 2004

机译：日本社会中关于日本移民思想的历史研究（日语）。
4. Recognizing Cursive Typewritten Text Using Segmentation-Free System [O] . Mohammad S. Khorsheed 2015

机译：使用无分段系统识别草书打字文本
5. COMPARISON AND IMPROVEMENT OF STRING MATCHING ALGORITHMS FOR JAPANESE TEXTS [O] . YOON Jeehee, TAKAGI Toshihisa, USHIJIMA Kazuo 1986

机译：日本文本字符串匹配算法的比较和改进

On the Improvement of Recognizing Single-Line Strings of Japanese Historical Cursive

摘要

著录项

相似文献

相关主题

期刊订阅