首页> 外文期刊>Journal of digital information management >Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics
【24h】

Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics

机译:从历史报纸的页面中检索和处理图像并建立文本主题

获取原文
获取原文并翻译 | 示例
           

摘要

Historical newspapers are a source of research for the human and social sciences. However, these image collections are difficult to read by machine due to the low quality of the print, the lack of standardization of the pages in addition to the low quality photograph of some files. This paper presents the processing model of a topic navigation system in historical newspaper page images. The general procedure consists of four modules which are: segmentation of text sub-images and text extraction, preprocessing and representation, induced topic extraction and representation, and document viewing and retrieval interface. The algorithmic and technological approaches of each module are described and the initial test results about a collection covering a range of 28 years are presented.
机译:历史报纸是人类和社会科学研究的源泉。 然而,由于印刷的低质量,这些图像集合难以通过机器读取,除了一些文件的低质量照片之外,页面的标准化缺乏标准化。 本文介绍了历史报纸页面图像中主题导航系统的处理模型。 通用程序由四个模块组成:文本子图像和文本提取,预处理和表示,引起主题提取和表示,以及文档查看和检索接口的分割。 介绍了每个模块的算法和技术方法,并提出了关于覆盖28年的集合的初始测试结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号