首页> 外文会议>IEEE International Conference on Multimedia and Expo >REPRESENTING WORD IMAGE USING VISUAL WORD EMBEDDINGS AND RNN FOR KEYWORD SPOTTING ON HISTORICAL DOCUMENT IMAGES
【24h】

REPRESENTING WORD IMAGE USING VISUAL WORD EMBEDDINGS AND RNN FOR KEYWORD SPOTTING ON HISTORICAL DOCUMENT IMAGES

机译:代表Word Image使用Visual Word Embeddings和RNN用于历史文档图像上的关键字发现

获取原文
获取外文期刊封面目录资料

摘要

Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated. For a word image, the average of embedding vectors of all visual words within the word image is taken as its embedding vector. Moreover, Recurrent Neural Network (RNN) is utilized to encode each word image into embeddings like an auto-encoder. The RNN embeddings and the visual word embeddings are complementary. In this study, all word images are represented by combining visual word embeddings and RNN embeddings. Experimental results show that the proposed representation approach is superior to the traditional BoVW, spatial pyramid matching and latent Dirichlet allocation.
机译:Visual-Lords(BOVW)框架的视觉词语是彼此独立的,这不仅导致丢弃视觉单词之间的空间令,而且缺少语义信息。本研究启发了Word Embeddings,类似嵌入程序应用于大量的视觉单词。通过这种方式,可以配制视觉词的相应嵌入矢量。对于单词图像,将单词图像中的所有视觉单词的嵌入向量的平均值作为其嵌入式向量。此外,经常性的神经网络(RNN)被用来将每个单词图像编码为嵌入式,如自动编码器。 RNN Embeddings和Visual Word Embeddings是互补的。在本研究中,通过组合视觉单词嵌入和RNN嵌入来表示所有字图像。实验结果表明,该代表方法优于传统的BOVW,空间金字塔匹配和潜在的Dirichlet分配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号