首页> 外文会议>ACM/IEEE-CS Joint Conference on Digital Libraries >Glyph miner: A system for efficiently extracting glyphs from early prints in the context of OCR
【24h】

Glyph miner: A system for efficiently extracting glyphs from early prints in the context of OCR

机译:雕文矿工:一种用于在OCR的背景下从早期打印中有效地提取字形的系统

获取原文

摘要

While off-the-shelf OCR systems work well on many modern documents, the heterogeneity of early prints provides a significant challenge. To achieve good recognition quality, existing software must be “trained” specifically to each particular corpus. This is a tedious process that involves significant user effort. In this paper we demonstrate a system that generically replaces a common part of the training pipeline with a more efficient workflow: Given a set of scanned pages of a historical document, our system uses an efficient user interaction to semi-automatically extract large numbers of occurrences of glyphs indicated by the user. In a preliminary case study, we evaluate the effectiveness of our approach by embedding our system into the workflow at the University Library Wu?rzburg.
机译:虽然现成的OCR系统在许多现代文件上工作,但早期印刷的异质性提供了重大挑战。为实现良好的识别质量,现有软件必须专门为每个特定语料库“培训”。这是一个繁琐的过程,涉及重要的用户努力。在本文中,我们展示了一个系统,一般地替换了一个更有效的工作流程的培训管道的共同部分:给定一组历史文档的扫描页面,我们的系统使用高效的用户交互来半自动提取大量的大量出现用户指示的字形。在初步案例研究中,我们通过将我们的系统嵌入大学图书馆吴?rzburg的工作流程来评估我们的方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号