【24h】

Key word spotting using HMM in printed Telugu documents

机译:使用HMM在打印的泰卢固语文档中发现关键字

获取原文

摘要

With the increase of multi media technology and internet there is a rapid growth in storing and retrieving of documents. Government has taken several methods for documents to scan and stored digitally for future use. Even though the documents are available in the digital format, but it is very difficult to search for a single word or phrase. Traditional optical character recognition techniques (OCR) and other text retrieval methods fail on these document images due to various types of noises. Word spotting will help the users to automatically search for a particular word/phrase in millions of such document images. In this paper we have proposed a word spotting technique for printed Telugu documents. Based on the word spotting technology, a collection of document images is converted into a collection of word images by word segmentation, and a number of profile based features are extracted to represent word images. Correlation and HMM model are applied for comparison of word images. Image to image matching is done by calculating similarities between a query word image and each word image in the collection.
机译:随着多媒体技术和互联网的增长,文档的存储和检索迅速增长。政府采取了多种方法对文档进行扫描并以数字方式存储以备将来使用。尽管文档以数字格式提供,但是很难搜索单个单词或短语。由于各种类型的噪声,传统的光学字符识别技术(OCR)和其他文本检索方法无法在这些文档图像上使用。单词发现将帮助用户自动搜索数百万个此类文档图像中的特定单词/短语。在本文中,我们提出了一种用于打印泰卢固语文档的单词识别技术。基于单词发现技术,文档图像的集合通过单词分割被转换为单词图像的集合,并且提取了许多基于轮廓的特征来表示单词图像。相关和HMM模型被用于单词图像的比较。图像到图像的匹配是通过计算查询词图像与集合中每个词图像之间的相似度来完成的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号