首页> 外文会议> >Logical structure analysis of book document images using contents information
【24h】

Logical structure analysis of book document images using contents information

机译:利用内容信息对书籍文档图像进行逻辑结构分析

获取原文

摘要

Numerous studies have so far been carried out extensively for the analysis of document image structure, with particular emphasis placed on media conversion and layout analysis. For the conversion of a collection of books in a library into the form of hypertext documents, a logical structure extraction technology is indispensable, in addition to document layout analysis. The table of contents of a book generally involves very concise and faithful information to represent the logical structure of the entire book. That is to say, we can efficiently analyze the logical structure of a book by making full use of its contents pages. This paper proposes a new approach for document logical structure analysis to convert document images and contents information into an electronic document. First, the contents pages of a book are analyzed to acquire the overall document logical structure. Thereafter, we are able to use this information to acquire the logical structure of all the pages of the book by analyzing consecutive pages of a portion of the book. Test results demonstrate very high discrimination rates: up to 97.6% for the headline structure, 99.4% for the text structure, 97.8% for the page-number structure and almost 100% for the head-foot structure.
机译:迄今为止,已经进行了大量的研究来分析文档图像结构,尤其着重于媒体转换和布局分析。为了将图书馆中的书籍集合转换为超文本文档的形式,除了文档布局分析之外,逻辑结构提取技术也是必不可少的。一本书的目录通常包含非常简洁和真实的信息,以表示整本书的逻辑结构。也就是说,我们可以通过充分利用其内容页面来有效地分析一本书的逻辑结构。本文提出了一种新的文档逻辑结构分析方法,可以将文档图像和内容信息转换为电子文档。首先,分析书籍的内容页面以获得整体文档逻辑结构。此后,我们能够通过分析该书的一部分的连续页面来使用此信息来获取该书所有页面的逻辑结构。测试结果显示出很高的辨别率:标题结构高达97.6%,文本结构高达99.4%,页码结构高达97.8%,头足结构高达100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号