...
【24h】

Recognition-based Segmentation for Digitization of Korean Historical Document Pages

机译:基于识别的韩国历史文献页面数字化分割

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We present a recognition-based digitization method for building digital library of large amount of historical archives. Digitization of historical document pages is essential for providing retrieval service and preventing from damages but needs laborious manual verification for accurate output. In this paper, split-merge approach is applied for segmenting overlapped and touched characters written by thick brushes. Character string images are split into primitive segments by nonlinear segmentation paths passing maximum curvature points. Split segments are merged in single probabilistic framework integrated by layout analysis, context information and recognition result. In experiment, our system achieved 96.4% character recognition rates on test data set, despite the obsolete characters and unique variants used in the archives. In conclusion, our method can be applied for digitizing Korean historical document pages and minimize manual verification.
机译:我们提出了一种基于识别的数字化方法,用于建立大量历史档案的数字图书馆。历史文档页面的数字化对于提供检索服务和防止损坏至关重要,但是需要费力的手动验证才能获得准确的输出。在本文中,分割合并方法用于分割由粗笔书写的重叠和触摸字符。字符串图像通过经过最大曲率点的非线性分割路径分为原始段。将拆分的段合并到通过布局分析,上下文信息和识别结果集成的单个概率框架中。在实验中,尽管档案中使用了过时的字符和独特的变体,但我们的系统仍在测试数据集上实现了96.4%的字符识别率。总之,我们的方法可用于数字化韩国历史文献页面并最大程度地减少人工验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号