首页> 外文会议>International conference on theory and practice of digital libraries >Creation of Textual Versions of Historical Documents from Polish Digital Libraries
【24h】

Creation of Textual Versions of Historical Documents from Polish Digital Libraries

机译:波兰数字图书馆创建历史文献的文本版本

获取原文

摘要

This paper describes the results of initial work aimed at increasing the number and improving the quality of textual versions of the historical documents available in Polish digital libraries. Digital libraries community is missing tools that integrate existing digitisation workflow with customizable OCR engine and crowd-based text correction, this paper describes work on providing such a solution. Apart from today's state of the art in this field, this paper includes a description of the Virtual Transcription Laboratory (VTL) prototype, a crowdsourcing platform that utilize the Tesseract OCR engine. The last chapter outlines results of the prototype's evaluation on real life dataset of historical documents from the IMPACT project. Results prove the applicability of the proposed solution as an enhancement of the digitisation workflow.
机译:本文介绍了旨在增加波兰数字图书馆中可用的历史文献的文本版本和提高其质量的初步工作的结果。数字图书馆社区缺少将现有的数字化工作流程与可定制的OCR引擎和基于人群的文本校正集成在一起的工具,本文描述了提供这种解决方案的工作。除了该领域的最新技术外,本文还介绍了虚拟转录实验室(VTL)原型,该原型是使用Tesseract OCR引擎的众包平台。上一章概述了原型对IMPACT项目中历史文档的真实数据集的评估结果。结果证明了所提出的解决方案作为数字化工作流程的增强的适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号