Transcribing historical Japanese document is the first step to preserve them as cultural assets. These historical documents can be directly useful not only for disaster prevention but also for enriching Japanese culture. Indeed, there is even an ongoing national project for more than 100 years with the aim of comprehensively compiling old documents. However, it is difficult to read Japanese historical cursive even for modern Japanese without training. In this paper, we report on research to recognize a single-line text of Japanese historical cursive. Our result has more than 95% accuracy for the text consisting of only 46 Hiragana characters, while 84.08% accuracy for the text including thousands of Kanji characters. Both of them are state-of-the-art accuracy. That is, our result on 46 Hiragana characters significantly outperformed the previous state-of-the-art. And this is the first research to recognize thousands of cursive Kanji characters. Furthermore, we had various experiments to improve the recognition accuracy of historical cursive, which includes data augmentation for rare characters, enhancement by language model, and fine-tuning with samples written by the same author as the test data. As a result, because of various handwriting styles, it is practically effective to fine-tune with samples written by the same author as the test data. It easily outperforms the improvements by data augmentation and language model.
展开▼