首页> 外文OA文献 >The Use of Latent Semantic Indexing to Mitigate OCR Effects of Related Document Images
【2h】

The Use of Latent Semantic Indexing to Mitigate OCR Effects of Related Document Images

机译:利用潜在语义索引缓解相关文档图像的OCR效应

摘要

Due to both the widespread and multipurpose use of document images and the current availability of a high number of document images repositories, robust information retrieval mechanisms and systems have been increasingly demanded. This paper presents an approach to support the automatic generation of relationships among document images by exploiting Latent Semantic Indexing (LSI) and Optical Character Recognition (OCR). We developed the LinkDI (Linking of Document Images) service, which extracts and indexes document images content, computes its latent semantics, and defines relationships among images as hyperlinks. LinkDI was experimented with document images repositories, and its performance was evaluated by comparing the quality of the relationships created among textual documents as well as among their respective document images. Considering those same document images, we ran further experiments in order to compare the performance of LinkDI when it exploits or not the LSI technique. Experimental results showed that LSI can mitigate the effects of usual OCR misrecognition, which reinforces the feasibility of LinkDI relating OCR output with high degradation.
机译:由于文档图像的广泛使用和多用途以及当前大量文档图像存储库的可用性,对鲁棒的信息检索机制和系统的需求日益增长。本文提出了一种通过利用潜在语义索引(LSI)和光学字符识别(OCR)支持文档图像之间的关系自动生成的方法。我们开发了LinkDI(文档图像链接)服务,该服务提取并索引文档图像内容,计算其潜在语义,并将图像之间的关系定义为超链接。 LinkDI在文档图像存储库中进行了实验,并通过比较文本文档之间以及它们各自文档图像之间创建的关系的质量来评估其性能。考虑到那些相同的文档图像,我们进行了进一步的实验,以比较LinkDI在是否采用LSI技术时的性能。实验结果表明,LSI可以减轻常见的OCR错误识别的影响,从而增强LinkDI将OCR输出与高劣化相关的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号