首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Synthetically Generated Semantic Codebook for Bag-of-Visual-Words Based Word Spotting
【24h】

Synthetically Generated Semantic Codebook for Bag-of-Visual-Words Based Word Spotting

机译:基于视觉词袋的词点识别的综合生成的语义码本

获取原文
获取外文期刊封面目录资料

摘要

Word-spotting methods based on the Bag-of-Visual-Words framework have demonstrated a good retrieval performance even when used in a completely unsupervised manner. Although unsupervised approaches are suitable for large document collections due to the cost of acquiring labeled data, these methods also present some drawbacks. For instance, having to train a suitable "codebook" for a certain dataset has a high computational cost. Therefore, in this paper we present a database agnostic codebook which is trained from synthetic data. The aim of the proposed approach is to generate a codebook where the only information required is the type of script used in the document. The use of synthetic data also allows to easily incorporate semantic information in the codebook generation. So, the proposed method is able to determine which set of codewords have a semantic representation of the descriptor feature space. Experimental results show that the resulting codebook attains a state-of-the-art performance while having a more compact representation.
机译:即使以完全无人监督的方式使用,基于可视包袋框架的单词发现方法也表现出良好的检索性能。尽管由于获取标记数据的成本,无监督方法适用于大型文档收集,但是这些方法也存在一些缺点。例如,必须为某个数据集训练合适的“码本”具有很高的计算成本。因此,在本文中,我们提出了一种从合成数据中训练出来的数据库不可知码本。提出的方法的目的是生成一个密码本,其中唯一需要的信息是文档中使用的脚本类型。合成数据的使用还允许轻松地将语义信息合并到代码本生成中。因此,所提出的方法能够确定哪一组码字具有描述符特征空间的语义表示。实验结果表明,所得到的代码本具有最先进的性能,同时具有更紧凑的表示形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号