首页> 外文会议>International Conference on Document Analysis and Recognition >Query by string word spotting based on character bi-gram indexing
【24h】

Query by string word spotting based on character bi-gram indexing

机译:基于字符二元语法索引的字符串单词查找查询

获取原文

摘要

In this paper we propose a segmentation-free query by string word spotting method. Both the documents and query strings are encoded using a recently proposed word representation that projects images and strings into a common attribute space based on a Pyramidal Histogram of Characters (PHOC). These attribute models are learned using linear SVMs over the Fisher Vector [8] representation of the images along with the PHOC labels of the corresponding strings. In order to search through the whole page, document regions are indexed per character bi-gram using a similar attribute representation. On top of that, we propose an integral image representation of the document using a simplified version of the attribute model for efficient computation. Finally we introduce a re-ranking step in order to boost retrieval performance. We show state-of-the-art results for segmentation-free query by string word spotting in single-writer and multi-writer standard datasets.
机译:在本文中,我们提出了一种通过字符串词发现方法进行无分段查询的方法。使用最近提出的单词表示对文档和查询字符串进行编码,该单词表示基于金字塔形字符直方图(PHOC)将图像和字符串投影到公共属性空间中。这些属性模型是通过在图像的Fisher Vector [8]表示以及相应字符串的PHOC标签上使用线性SVM来学习的。为了搜索整个页面,使用相似的属性表示法按字符二元语法对文档区域进行索引。最重要的是,我们建议使用属性模型的简化版本对文档进行完整的图像表示,以进行有效的计算。最后,我们引入了重新排序步骤,以提高检索性能。我们通过单书写者和多书写者标准数据集中的字符串单词查找,显示了无分段查询的最新结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号