首页> 外文会议>International Conference on Document Analysis and Recognition >Content-based indexing and retrieval method of Chinese document images
【24h】

Content-based indexing and retrieval method of Chinese document images

机译:基于内容的中文文档图像的索引和检索方法

获取原文
获取外文期刊封面目录资料

摘要

In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is a Chinese document image (non-ASCII file), we may first convert the document image into the text file by using Chinese optical character recognition (OCR) technology and then index the document by using an information retrieval algorithm. However, OCR needs more time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate the stroke density of each Chinese character image, and at last attain the stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers a retrieval method for Chinese document images based on the index technology. We discuss the index and retrieval method for duplicate detection. We have proved the validity of the index method through its application to keyword spotting and duplicate detection.
机译:在中文信息检索中,很容易索引中文文本文档进行检索。我们只需要将文本文档分段为短语。当文档是中文文档图像(非ASCII文件)时,我们可以首先通过使用汉字光学字符识别(OCR)技术将文档图像转换为文本文件,然后使用信息检索算法索引文档。但是,OCR需要更多的时间,这可以影响检索效率。本文提出了一种基于行程密度代码的索引方法。首先分段文档图像以获取所有中文字符图像,然后计算每个汉字图像的行程密度,最后达到字符图像的行程密度代码。索引方法具有速度和稳健性的优点。此外,本文还提供了基于索引技术的中式文档图像的检索方法。我们讨论重复检测的索引和检索方法。我们通过应用于关键字发现和重复检测,我们已经证明了索引方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号