首页> 外文会议> >Content-based indexing and retrieval method of Chinese document images
【24h】

Content-based indexing and retrieval method of Chinese document images

机译:基于内容的中文文档图像索引与检索方法

获取原文

摘要

In Chinese information retrieval, it is easy to index a Chinese text document for retrieval. We just need to segment the text document into phrases. When the document is a Chinese document image (non-ASCII file), we may first convert the document image into the text file by using Chinese optical character recognition (OCR) technology and then index the document by using an information retrieval algorithm. However, OCR needs more time, which can influence retrieval efficiency. This paper proposes an index method based on stroke density code. First segment the document image to get all the Chinese character images, then calculate the stroke density of each Chinese character image, and at last attain the stroke density code of the character image. The index method has the advantage of speed and robustness to noise. In addition, this paper also offers a retrieval method for Chinese document images based on the index technology. We discuss the index and retrieval method for duplicate detection. We have proved the validity of the index method through its application to keyword spotting and duplicate detection.
机译:在中文信息检索中,很容易为中文文本文档建立索引以进行检索。我们只需要将文本文档细分为短语即可。当文档是中文文档图像(非ASCII文件)时,我们可以首先使用中文光学字符识别(OCR)技术将文档图像转换为文本文件,然后使用信息检索算法对文档进行索引。但是,OCR需要更多时间,这可能会影响检索效率。提出了一种基于笔划密度码的索引方法。首先对文档图像进行分割,得到所有汉字图像,然后计算每个汉字图像的笔划密度,最后得到字符图像的笔划密度代码。索引方法具有速度快和对噪声的鲁棒性的优点。此外,本文还提供了一种基于索引技术的中文文档图像检索方法。我们讨论了用于重复检测的索引和检索方法。通过将索引方法应用于关键词发现和重复检测,已经证明了索引方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号