首页> 外文会议>Sanskrit computational linguistics >Applying the OCRopus OCR System to Scholarly Sanskrit Literature
【24h】

Applying the OCRopus OCR System to Scholarly Sanskrit Literature

机译:将OCTopus OCR系统应用于梵文学术文献

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

OCRopus is an open source OCR system currently being developed, intended to be omni-lingual and omni-script. In addition to modern digital library applications, applications of the system include capturing and recognizing classical literature, as well as the large body of research literature about classics. OCRopus advances the state of the art in a number of ways, including the ability easily to plug in new text recognition and layout analysis modules, the use of adaptive and user extensible character recognition, and statistical and trainable layout analysis. Of particular interest for computational linguistics applications is the consistent use of probability estimates throughout the system and the use of weighted finite state transducers to represent both alternative recognition hypotheses and statistical language models. In this paper, I first give an overview of these technologies and their relevance to digital library applications in the humanities, and then focus on the use of statistical language models and their use for the integration of OCR output with subsequent computational linguistic and information extraction modules.
机译:OCRopus是目前正在开发的开放源代码OCR系统,旨在使用多种语言和多种文字。除现代数字图书馆应用外,该系统的应用还包括捕获和识别古典文学以及有关古典的大量研究文学。 OCRopus通过多种方式提高了技术水平,包括轻松插入新的文本识别和布局分析模块,使用自适应和用户可扩展的字符识别以及统计和可训练的布局分析的能力。对于计算语言学应用,特别感兴趣的是在整个系统中一致使用概率估计,以及使用加权有限状态换能器来表示替代识别假设和统计语言模型。在本文中,我首先概述了这些技术及其与人文数字图书馆应用的相关性,然后重点介绍了统计语言模型的使用及其在OCR输出与后续计算语言和信息提取模块的集成中的使用。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号