首页> 外文会议>21st Nordic Conference of Computational Linguistics >Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur Antiqua Models and Image Preprocessing
【24h】

Improving Optical Character Recognition of Finnish Historical Newspapers with a Combination of Fraktur Antiqua Models and Image Preprocessing

机译:Fraktur和Antiqua模型与图像预处理相结合,提高芬兰历史报纸的光学字符识别能力

获取原文
获取原文并翻译 | 示例

摘要

In this paper we describe a method for improving the optical character recognition (OCR) toolkit Tesseract for Finnish historical documents. First we create a model for Finnish Fraktur fonts. Second we test Tesseract with the created Fraktur model and Antiqua model on single images and combinations of images with different image preprocessing methods. Against commercial ABBYY FineReader toolkit our method achieves 27.48% (FineReader 7 or 8) and 9.16% (FineReader 11) improvement on word level.
机译:在本文中,我们描述了一种用于改善芬兰历史文献的光学字符识别(OCR)工具包Tesseract的方法。首先,我们为芬兰Fraktur字体创建一个模型。其次,我们使用创建的Fraktur模型和Antiqua模型在单个图像以及具有不同图像预处理方法的图像组合上测试Tesseract。与商业ABBYY FineReader工具包相比,我们的方法在单词级别上提高了27.48%(FineReader 7或8)和9.16%(FineReader 11)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号