首页> 外文会议>International Conference on Frontiers in Handwriting Recognition >Training Schemes for the Transliteration of the Balinese Script Into the Latin Script on Palm Leaf Manuscript Images
【24h】

Training Schemes for the Transliteration of the Balinese Script Into the Latin Script on Palm Leaf Manuscript Images

机译:在棕榈叶手稿图像上将巴厘岛文字译成拉丁文字的培训计划

获取原文

摘要

Considering the importance of the contents of the Balinese palm leaf manuscripts, transliteration system has to be developed in order to be able to read easily these manuscripts. The challenge comes from the fact that Balinese script is a syllabic script and the mapping between linguistic symbols and images of symbols is not straightforward. In addition, with a very limited training data availability, some adaptations of LSTM in the transliteration training scheme need to be designed, to be analyzed and to be evaluated. This paper contributes in proposing and evaluating some adapted segmentation free training schemes for the transliteration of the Balinese script into the Latin script from palm leaf manuscript images. We describe the generated synthetic dataset and the proposed training schemes at two different levels (word level and text line level) to transliterate the real word and text lines from palm leaf manuscript images. For word transliteration, in general, training schemes at word level perform better than training schemes at text line level. As comparison, the segmentation based transliteration method gives a very promising result. For text line transliteration, segmentation based transliteration method outperforms all segmentation free training schemes for the less degraded collections, while the segmentation free training schemes contributes in transliterating the text lines for more degraded manuscripts. Training at text line level with a pre-trained model at word level could give a better result in word transliteration while still keeping the optimal performances for text line transliteration.
机译:考虑到巴厘岛棕榈叶手稿内容的重要性,必须开发音译系统,以便能够轻松阅读这些手稿。挑战来自这样一个事实,即巴厘岛脚本是一个音节脚本,并且语言符号和符号图像之间的映射并不简单。另外,由于培训数据的可用性非常有限,因此需要设计,分析和评估音译培训方案中LSTM的某些改编。本文为提出和评估一些适合的无分割训练方案,以将巴厘岛手稿从棕榈叶手稿图像中音译为拉丁语脚本提供了帮助。我们在两个不同的级别(单词级别和文本行级别)描述生成的合成数据集和拟议的训练方案,以对来自棕榈叶手稿图像的真实单词和文本行进行音译。对于单词音译,通常而言,单词级别的训练方案比文本行级别的训练方案性能更好。相比之下,基于分段的音译方法给出了非常有希望的结果。对于文本行音译,基于分段的音译方法优于针对降级程度较小的集合的所有无分段训练方案,而无分段的训练方案有助于对降级程度更大的手稿进行音译。在单词级别使用预训练模型在文本行级别进行训练可以在单词音译时提供更好的结果,同时仍保持文本行音译的最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利