首页> 外文期刊>Journal of digital information management >Content Based Text Information Search and Retrieval in Document Images for Digital Library
【24h】

Content Based Text Information Search and Retrieval in Document Images for Digital Library

机译:数字图书馆文档图像中基于内容的文本信息搜索与检索

获取原文
获取原文并翻译 | 示例
       

摘要

The main objective of this research work is to find the keywords in the captured/scanned print document images in the image database. Document images are becoming more popular in today's world and these are used in paperless offices and digital libraries. Information retrieval from the document images is a very challenging task. Hence, there is a need for developing searching strategies to find the required information from these document images as per user's needs, becomes very essential in nowadays. Traditionally Optical Character Recognition (OCR) tools are used for information retrieval from the document images, but it's not an efficient method. Word spotting is an inventive method for searching the document images and to retrieve relevant information without any conversion. In this work an algorithm Enhanced Dynamic Time Warping was proposed to for finding keywords from document images, it is based on word spotting technique. Different matching algorithms are made available for word spotting. Popular algorithms are Normalization Cross Correlation (NCC) and Dynamic Time Warping (DTW). In this work, we have compared the performance of these two existing algorithms with the proposed algorithm named as Enhanced Dynamic Time Warping algorithm (EDTW). Different image formats and different sizes of images are used for experimentation. From the results it is observed that the proposed algorithm has produced good results than an existing one.
机译:这项研究工作的主要目的是在图像数据库中的捕获/扫描的打印文档图像中找到关键字。文档图像在当今世界变得越来越流行,并且在无纸化办公室和数字图书馆中使用。从文档图像中检索信息是一项非常艰巨的任务。因此,需要开发搜索策略以根据用户的需求从这些文档图像中找到所需的信息,这在当今变得非常重要。传统上,光学字符识别(OCR)工具用于从文档图像中检索信息,但这不是一种有效的方法。单词斑点识别是一种用于搜索文档图像并无需任何转换即可检索相关信息的创新方法。在这项工作中,提出了一种基于词点识别技术的增强动态时间规整算法,用于从文档图像中查找关键字。使不同的匹配算法可用于单词发现。流行的算法是归一化互相关(NCC)和动态时间规整(DTW)。在这项工作中,我们将这两种现有算法的性能与提出的称为增强动态时间规整算法(EDTW)的算法进行了比较。实验使用不同的图像格式和不同大小的图像。从结果可以看出,提出的算法比现有算法产生了很好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号