首页> 外文会议> >Document digitization technology and its application for digital library in China
【24h】

Document digitization technology and its application for digital library in China

机译:文献数字化技术及其在中国数字图书馆中的应用

获取原文

摘要

We introduce the research of document digitization technology and its applications for constructing digital libraries in China. We focus on two major objectives of document digitization technologies: performance and efficiency. Taking the most representative TH-OCR product as an example, the up-to-date research achievements on both kernel OCR technologies and peripheral technologies in China are presented. The kernel technologies include high performance multilingual (Chinese, Japanese, Korean and English) text recognition, layout analysis, understanding and reconstruction; the peripheral technologies include the network document digitization workflow and intelligent proofreading, which greatly improve the efficiency. The applications of TH-OCR has two types of final output digital documents, one is the reconstructed electronic document with full text and layout information of the original paper-based document, the other is the multilevel document with OCR output text layer under the image layer. Numerous applications indicate that current technologies can greatly facilitate the mass-volume digitization labour in building digital library infrastructure.
机译:介绍了文献数字化技术的研究及其在我国数字图书馆建设中的应用。我们专注于文档数字化技术的两个主要目标:性能和效率。以最具代表性的TH-OCR产品为例,介绍了中国在内核OCR技术和外围技术方面的最新研究成果。内核技术包括高性能的多语言(中文,日文,韩文和英文)文本识别,布局分析,理解和重构;外围技术包括网络文档数字化工作流程和智能校对,大大提高了工作效率。 TH-OCR的应用程序有两种类型的最终输出数字文档,一种是具有原始纸质文档全文和布局信息的重构电子文档,另一种是图像层下方具有OCR输出文本层的多层文档。大量的应用表明,当前的技术可以极大地促进建立数字图书馆基础设施中的大量数字化工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号