首页> 外文会议>International Conference on Document Analysis and Recognition >Document page similarity based on layout visual saliency: Application to query by example and document classification
【24h】

Document page similarity based on layout visual saliency: Application to query by example and document classification

机译:文档基于布局视觉显着性的页面相似性:按示例和文档分类查询的应用程序

获取原文
获取外文期刊封面目录资料

摘要

In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of the page composition. This similarity is computed using both the document layout and characteristics of the text itself. The text characterization uses statistical features derived from textural primitives. Our purpose is to establish perceptive links between documents in order to facilitate their storage and their retrieval. In this paper we present two possible applications of this measure of similarity: the query of the corpus by example and the documents classification. In the first application, -we extract documents that are the most visually similar to a document, given as query. In the second application, the similarity measure is used to classify the document under investigation using its visual similarity to a reference set of documents. Our test corpus is extracted from the Finland MTDB Oulu multi-genre database that provides a great diversity of page layouts and contents.
机译:在本文中,我们建议定义视觉相似性的衡量标准,以比较语料库中的不同页面。该措施基于分析页面组成的视觉布局显着性。使用文本布局和文本本身的特征来计算此相似性。文本表征使用来自纹理基元的统计特征。我们的目的是建立文件之间的看法联系,以便于他们的存储及其检索。在本文中,我们提出了这种相似度的两个可能的应用:通过示例和文档分类来查询语料库。在第一个应用程序中,我们将作为查询给出的文档中最具视觉上的文档。在第二个应用中,相似度测量用于将文档分类为使用其视觉相似性与参考文档集的调查。我们的测试语料库是从芬兰MTDB ooulu多类型数据库中提取的,提供了卓越的页面布局和内容。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号