首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Page Retrieval System in Digitized Historical Books Based on Error-Tolerant Subgraph Matching
【24h】

Page Retrieval System in Digitized Historical Books Based on Error-Tolerant Subgraph Matching

机译:基于容错子图匹配的数字化历史图书页面检索系统

获取原文

摘要

Developing smart ways of interacting with scanners is one of the emerging needs identified by numerous digitization professionals. To achieve better interaction with scanners, the research community in historical document image analysis is particularly interested in providing reliable tools for computer-aided indexing and retrieval of historical document images. Thus, we propose in this article a method able to retrieve from a digitized historical book, pages having layout and/or content which meet the user-defined query. Amongst the user-defined queries we focus on the transition pages (e.g. title pages of chapter, end-of-chapter and end-of-act) and pages containing a particular content component or a group of patterns (e.g. ornaments, illustrations and drop caps) in our work. The method adopted in this work is firstly based on using low-level features (texture, shape and geometric descriptors) to represent each page in the form of a graph-based signature. Then, a set of costs is estimated using an error-tolerant subgraph isomorphism algorithm in order to measure the similarity between the user-defined query formulated in terms of a pattern graph and the different subgraphs of the book page signatures and to find book pages similar to the user-defined query. To illustrate the effectiveness of the proposed method, a thorough experimental study has been conducted with quantitative observations obtained from a large number of queries having different contents and structures.
机译:开发与扫描仪进行交互的智能方式是众多数字化专业人员发现的新兴需求之一。为了实现与扫描仪的更好的交互,历史文档图像分析领域的研究人员特别希望提供可靠的工具,用于计算机辅助索引和检索历史文档图像。因此,我们在本文中提出了一种方法,该方法能够从数字化历史书中检索具有满足用户定义查询的布局和/或内容的页面。在用户定义的查询中,我们重点关注过渡页面(例如,章的标题页面,章尾和动作结束)以及包含特定内容成分或一组模式(例如装饰,插图和掉落)的页面上限)。这项工作中采用的方法首先基于使用低级特征(纹理,形状和几何描述符)以基于图形的签名的形式表示每个页面。然后,使用容错子图同构算法估算一组成本,以便测量根据模式图制定的用户定义查询与书页签名的不同子图之间的相似度,并查找相似的书页。到用户定义的查询。为了说明所提出方法的有效性,已经进行了全面的实验研究,并从大量具有不同内容和结构的查询中获得了定量观察结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号