Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics

Gildacio J. de A. Sa; Jose E. B. Maia

首页> 外文期刊>Journal of digital information management >Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics

【24h】

Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics

机译：从历史报纸的页面中检索和处理图像并建立文本主题

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Historical newspapers are a source of research for the human and social sciences. However, these image collections are difficult to read by machine due to the low quality of the print, the lack of standardization of the pages in addition to the low quality photograph of some files. This paper presents the processing model of a topic navigation system in historical newspaper page images. The general procedure consists of four modules which are: segmentation of text sub-images and text extraction, preprocessing and representation, induced topic extraction and representation, and document viewing and retrieval interface. The algorithmic and technological approaches of each module are described and the initial test results about a collection covering a range of 28 years are presented.

机译：历史报纸是人类和社会科学研究的源泉。然而，由于印刷的低质量，这些图像集合难以通过机器读取，除了一些文件的低质量照片之外，页面的标准化缺乏标准化。本文介绍了历史报纸页面图像中主题导航系统的处理模型。通用程序由四个模块组成：文本子图像和文本提取，预处理和表示，引起主题提取和表示，以及文档查看和检索接口的分割。介绍了每个模块的算法和技术方法，并提出了关于覆盖28年的集合的初始测试结果。

著录项

来源
《Journal of digital information management》 |2021年第2期|41-46|共6页
作者
Gildacio J. de A. Sa; Jose E. B. Maia;
展开▼
作者单位

Universidade Estadual do Ceara - UECE Ciencia da Computacao - CCT 60714-903 - Fortaleza - Ceara - Brasil;

Universidade Estadual do Ceara - UECE Ciencia da Computacao - CCT 60714-903 - Fortaleza - Ceara - Brasil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Historical Newspapers; Lexical Standardization; Induced Topic Model; Information Retrieval; Natural Language Processing;

机译：历史报纸;词汇标准化;诱导主题模型;信息检索;自然语言处理;

相似文献

外文文献
中文文献
专利

1. iDocChip: A Configurable Hardware Architecture for Historical Document Image Processing: Multiresolution Morphology-based Text and Image Segmentation [J] . Menbere Kina Tekleyohannes, Vladimir Rybalkin, Muhammad Mohsin Ghaffar, International journal of parallel programming . 2021,第2期

机译：IDOCCHIP：用于历史文档图像处理的可配置硬件架构：基于多分辨率的形态学文本和图像分割
2. The value of critical destruction: Evaluating multispectral image processing methods for the analysis of primary historical texts [J] . Giacometti Alejandro, Campagnolo Alberto, MacDonald Lindsay, Literary & linguistic computing . 2017,第1期

机译：严重破坏的价值：评估用于主要历史文本分析的多光谱图像处理方法
3. Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing [J] . Yang Li, Wen-Zhuo Song, Bo Yang 计算机科学技术学报（英文版） . 2018,第005期

机译：基于随机变分推理的大规模文本并行和在线监督主题模型
4. Segmenting Messy Text: Detecting Boundaries in Text Derived from Historical Newspaper Images [C] . Carol Anderson, Phil Crone International Conference on Pattern Recognition . 2021

机译：分割杂乱文本：检测源自历史报纸图像的文本的边界
5. Remote sensing image processing, modelling and georeferencing: Automatic methodology to obtain oceanographic parameters (Spanish text). [D] . Eugenio Gonzalez, Francisco. 2000

机译：遥感图像处理，建模和地理配准：获取海洋学参数的自动方法（西班牙语）。
6. Image Engine: an object-oriented multimedia database for storing retrieving and sharing medical images and text. [O] . H. J. Lowe 1993

机译：图像引擎：面向对象的多媒体数据库用于存储检索和共享医学图像和文本。
7. A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers [O] . Allen Robert B, Copeland Andrea J., Achananuparp Palakorn, 2007

机译：文本处理和支持访问数字化历史报纸收藏的框架

Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics

摘要

著录项

相似文献

相关主题

期刊订阅