首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Comprehensive Global Typography Extraction System for Electronic Book Documents
【24h】

Comprehensive Global Typography Extraction System for Electronic Book Documents

机译:综合全球排版提取系统电子书文件

获取原文

摘要

Book documents usually have consistent typographies throughout the whole book, including headers, footers, columns, text line directions, and fonts used in the each level of headings. Such document-level typography information is of great value for downstream document processing applications. This paper presents a document analysis system that can  extract a comprehensive set of  typographies used in book documents. The system consists of several components: recognition of fonts used in the body text and chapter headings; detection of page body area, headers and footers; detection of columns, text line direction and line spacing of body text. Page-association is employed in the system. The preliminary experimental results demonstrate the effectiveness of the system.
机译:预订文档通常在整本书中具有一致的排版,包括在每个级别的标题中使用的标题,页脚,列,文本线路方向和字体。此类文档级别排版信息对于下游文档处理应用程序具有很大的价值。本文介绍了一个文档分析系统,可以提取书籍文档中使用的一套综合排版。该系统由多个组件组成:识别身体文本和章节标题的字体;检测页面体积,页眉和页脚;检测列身体文本的列,文本线方向和线间距。页面关联在系统中使用。初步实验结果表明了系统的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号