LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

机译：基于LDA的单词图像表示法在蒙古历史文献上的关键词识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The original Bag-of-Visual-Words approach discards the spatial relations of the visual words. In this paper, a LDA-based topic model is adopted to obtain the semantic relations of visual words for each word image. Because the LDA-based topic model usually hurts retrieval performance when directly employs itself. Therefore, the LDA-based topic model is linearly combined with a visual language model for each word image in this study. After that, the basic query likelihood model is used for realizing the procedure of retrieval. The experimental results on our dataset show that the proposed LDA-based representation approach can efficiently and accurately attain to the aim of keyword spotting on a collection of historical Mongolian documents. Meanwhile, the proposed approach improves the performance significantly than the original BoVW approach.

机译：原始的视觉词袋方法放弃了视觉词的空间关系。本文采用基于LDA的主题模型来获取每个单词图像中视觉单词的语义关系。因为基于LDA的主题模型通常在直接使用自身时会损害检索性能。因此，在本研究中，针对每个单词图像，将基于LDA的主题模型与视觉语言模型进行线性组合。之后，使用基本查询似然模型来实现检索过程。在我们的数据集上的实验结果表明，所提出的基于LDA的表示方法可以有效，准确地达到在蒙古历史文献集上发现关键词的目的。同时，与原始BoVW方法相比，所提出的方法显着提高了性能。

著录项

来源
《International conference on neural information processing》|2016年|432-441|共10页
会议地点
作者
Hongxi Wei; Guanglai Gao; Xiangdong Su;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Latent Dirichlet Allocation (LDA); Topic model; Visual language model; Keyword spotting; Query likelihood model;

机译：潜在狄利克雷分配（LDA）;主题模型;视觉语言模型;关键字发现;查询似然模型;

相似文献

外文文献
中文文献
专利

1. A keyword retrieval system for historical Mongolian document images [J] . Hongxi Wei, Guanglai Gao International Journal on Document Analysis and Recognition . 2014,第1期

机译：蒙古文历史文献图像关键词检索系统
2. Keyword-guided Word Spotting In Historical Printed Documents Using Synthetic Data And User Feedback [J] . T, Konidaris, B, International Journal on Document Analysis and Recognition . 2007,第2a4期

机译：使用合成数据和用户反馈在历史印刷文档中使用关键字引导的单词识别
3. HMM word graph based keyword spotting in handwritten document images [J] . Toselli Alejandro Hector, Vidal Enrique, Romero Veronica, Information Sciences: An International Journal . 2016,第Null期

机译：手写文档图像中基于HMM词图的关键词识别
4. LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents [C] . Hongxi Wei, Guanglai Gao, Xiangdong Su International Conference on Neural Information Processing . 2016

机译：基于LDA的Word图像表示，用于历史蒙古文档上的关键字发现
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Click-words: learning to predict document keywords from a user perspective [O] . Rezarta Islamaj Doğan, Zhiyong Lu -1

机译：点击字词：从用户角度学习预测文档关键字
7. Keyword Spotting in Document Images through Word Shape Coding [O] . Shuyong Bai, Linlin Li, Chew Lim Tan 2010

机译：通过词形编码发现文档图像中的关键词

LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolian Documents

摘要

著录项

相似文献

相关主题

期刊订阅