首页> 外文会议> >Distance measures for layout-based document image retrieval
【24h】

Distance measures for layout-based document image retrieval

机译:基于布局的文档图像检索的距离度量

获取原文

摘要

Most methods for document image retrieval rely solely on text information to find similar documents. This paper describes a way to use layout information for document image retrieval instead. A new class of distance measures is introduced for documents with Manhattan layouts, based on a two-step procedure: First, the distances between the blocks of two layouts are calculated. Then, the blocks of one layout are assigned to the blocks of the other layout in a matching step. Different block distances and matching methods are compared and evaluated using the publicly available MARG database. On this dataset, the layout type can be determined successfully in 92.6% of the cases using the best distance measure in a nearest neighbor classifier. The experiments show that the best distance measure for this task is the overlapping area combined with the Manhattan distance of the corner points as block distance together with the minimum weight edge cover matching.
机译:用于文档图像检索的大多数方法仅依靠文本信息来查找相似的文档。本文介绍了一种使用布局信息进行文档图像检索的方法。基于两步过程,针对具有曼哈顿布局的文档引入了一类新的距离度量:首先,计算两个布局的块之间的距离。然后,在匹配步骤中将一种布局的块分配给另一种布局的块。使用公共可用的MARG数据库比较和评估不同的块距离和匹配方法。在此数据集上,可以使用最近邻分类器中的最佳距离度量在92.6%的情况下成功确定布局类型。实验表明,用于此任务的最佳距离度量是重叠区域,再加上拐角点的曼哈顿距离(作为块距离)以及最小的权重边缘覆盖匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号