Key word spotting using HMM in printed Telugu documents

机译：使用HMM在打印的泰卢固语文档中发现关键字

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the increase of multi media technology and internet there is a rapid growth in storing and retrieving of documents. Government has taken several methods for documents to scan and stored digitally for future use. Even though the documents are available in the digital format, but it is very difficult to search for a single word or phrase. Traditional optical character recognition techniques (OCR) and other text retrieval methods fail on these document images due to various types of noises. Word spotting will help the users to automatically search for a particular word/phrase in millions of such document images. In this paper we have proposed a word spotting technique for printed Telugu documents. Based on the word spotting technology, a collection of document images is converted into a collection of word images by word segmentation, and a number of profile based features are extracted to represent word images. Correlation and HMM model are applied for comparison of word images. Image to image matching is done by calculating similarities between a query word image and each word image in the collection.

机译：随着多媒体技术和互联网的增长，文档的存储和检索迅速增长。政府采取了多种方法对文档进行扫描并以数字方式存储以备将来使用。尽管文档以数字格式提供，但是很难搜索单个单词或短语。由于各种类型的噪声，传统的光学字符识别技术（OCR）和其他文本检索方法无法在这些文档图像上使用。单词发现将帮助用户自动搜索数百万个此类文档图像中的特定单词/短语。在本文中，我们提出了一种用于打印泰卢固语文档的单词识别技术。基于单词发现技术，文档图像的集合通过单词分割被转换为单词图像的集合，并且提取了许多基于轮廓的特征来表示单词图像。相关和HMM模型被用于单词图像的比较。图像到图像的匹配是通过计算查询词图像与集合中每个词图像之间的相似度来完成的。

著录项

来源
《International Conference on Signal Processing, Communication, Power and Embedded System》|2016年|1997-2000|共4页
会议地点
作者
D. Nagasudha; Y. Madhavee Latha;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Feature extraction; Correlation coefficient; Image retrieval; Image recognition; Signal processing; Optical character recognition software;

机译：隐马尔可夫模型;特征提取;相关系数;图像检索;图像识别;信号处理;光学字符识别软件;

相似文献

外文文献
中文文献
专利

1. Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage [J] . Neural computing & applications . 2020,第13期

机译：用于多脚本关键字在具有识别阶段的打印和手写文档中的多脚本关键字的混合HMM / BLSTM系统
2. Real Time Creation of Pseudo 2D HMMs for Composite Keyword Spotting in Document Images [J] . Beom-Joon CHO, Bong-Kee SIN, Jin H. KIM IEICE Transactions on Information and Systems . 2004,第10期

机译：实时创建用于文档图像中复合关键词识别的伪2D HMM
3. HMM word graph based keyword spotting in handwritten document images [J] . Toselli Alejandro Hector, Vidal Enrique, Romero Veronica, Information Sciences: An International Journal . 2016,第Null期

机译：手写文档图像中基于HMM词图的关键词识别
4. Key Word Spotting using HMM in Printed Telugu Documents [C] . Nagasudha D., Y. Madhavee Latha International Conference on Signal Processing, Communication, Power and Embedded System . 2016

机译：在印刷的Telugu文档中使用HMM的关键词斑点
5. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
6. Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting [O] . Martin Wöllmer, Erik Marchi, Stefano Squartini, 2011

机译：多流LSTM-HMM解码和直方图均衡以增强噪声健壮关键字
7. A Deep HMM model for multiple keywords spotting in handwritten documents [O] . Thomas, Simon, Chatelain, Clement, Heutte, Laurent, 2015

机译：用于手写文档中发现多个关键字的Deep HMM模型

Key word spotting using HMM in printed Telugu documents

摘要

著录项

相似文献

相关主题

期刊订阅