...
首页> 外文期刊>International Journal on Document Analysis and Recognition >A knowledge-based recognition system for historical Mongolian documents
【24h】

A knowledge-based recognition system for historical Mongolian documents

机译:基于知识的蒙古历史文献识别系统

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a knowledge-based system to recognize historical Mongolian documents in which the words exhibit remarkable variation and character overlapping. According to the characteristics of Mongolian word formation, the system combines a holistic scheme and a segmentation-based scheme for word recognition. Several types of words and isolated suffixes that cannot be segmented into glyph-units or do not require segmentation are recognized using the holistic scheme. The remaining words are recognized using the segmentation-based scheme, which is the focus of this paper. We exploit the knowledge of the glyph characteristics to segment words into glyph-units in the segmentation-based scheme. Convolutional neural networks are employed not only for word recognition in the holistic scheme, but also for glyph-unit recognition in the segmentation-based scheme. Based on the analysis of recognition errors in the segmentation-based scheme, the system is enhanced by integrating three strategies into glyph-unit recognition. These strategies involve incorporating baseline information, glyph-unit grouping, and recognizing under-segmented and over-segmented fragments. The proposed system achieves 80.86 % word accuracy on the Mongolian Kanjur test samples.
机译:本文提出了一种基于知识的系统来识别蒙古历史文献,其中单词表现出明显的变异和字符重叠。根据蒙古语单词形成的特点,该系统结合了整体方案和基于分段的方案来进行单词识别。使用整体方案可以识别无法分割为字形单位或不需要分割的几种类型的单词和孤立的后缀。剩余的单词可以使用基于分段的方案来识别,这是本文的重点。我们利用字形特征的知识,在基于分段的方案中将单词分割为字形单元。卷积神经网络不仅用于整体方案中的单词识别,而且还用于基于分段的方案中的字形单元识别。在对基于分割的方案中的识别错误进行分析的基础上,通过将三种策略集成到字形单元识别中来增强了系统。这些策略涉及合并基线信息,字形单元分组,以及识别分段不足和分段过多的片段。所提出的系统在蒙古语Kanjur测试样本上实现了80.86%的单词准确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号