Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres

机译：数字化历史报纸的自动化处理：段和类型的标识

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many historical newspapers are being digitized. We aim to support access to them via text analysis of the OCRd content. However, the OCR includes many errors; so extracting meaningful content from it is difficult. A pipeline of processing steps is proposed. Here, we describe the first two steps: segmentation and genre identification. The segmentation procedure based on headings was quite successful. Genre identification worked well for easily defined genre categories such as weather reports. We also propose additional techniques which may improve the accuracy still farther.

机译：许多历史性报纸正在被数字化。我们旨在通过对OCRd内容的文本分析来支持对它们的访问。但是，OCR包含许多错误。因此很难从中提取有意义的内容。提出了一系列处理步骤。在这里，我们描述了前两个步骤：细分和体裁识别。基于标题的分割过程非常成功。对于容易定义的类型类别（例如天气报告），类型识别非常有效。我们还提出了其他技术，可以进一步提高精度。

著录项

来源
《Digital Libraries: Universal and Ubiquitous Access to Information》|2008年|379-386|共8页
会议地点 Bali(ID);Bali(ID)
作者
Robert B. Allen; Ilya Waldstein; Weizhong Zhu;
展开▼
作者单位

College of Information Science and Technology, Drexel University;

College of Information Science and Technology, Drexel University;

College of Information Science and Technology, Drexel University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
categorization; genres; historical newspapers; OCR; segmentation;

机译：分类体裁历史报纸； OCR；分割;

相似文献

外文文献
中文文献
专利

1. Europeana Newspapers: searching digitized historical newspapers from 23 European countries [J] . Marieke Willems, Rossitza Atanassova Insights . 2015,第1期

机译：欧洲报纸：搜索来自23个欧洲国家的数字化历史报纸
2. Toward a Metadata Standard for Digitized Historical Newspapers [J] . Ray L. Murray Microform and imaging review . 2005,第3期

机译：迈向数字化历史报纸的元数据标准
3. VIZRT ANIMATION SOFTWARE DIGITIZES NEWSPAPER REVIEW SEGMENT FOR SKY ITALIA [J] . Advanced imaging . 2008,第2期

机译：VIZRT动画软件将SKY ITALIA的报纸评论部分数字化
4. Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres [C] . Robert B. Allen, Ilya Waldstein, Weizhong Zhu International Conference on Asian Digital Libraries . 2008

机译：数字化历史报纸的自动化处理：识别细分和类型
5. System integration and image pre-processing for an automated, real-time identification and monitoring system for coral reef fish. [D] . Tonde, Chetan 2010

机译：系统集成和图像预处理，用于珊瑚礁鱼的自动，实时识别和监控系统。
6. Historical Biography for Translational Medicine: An Important Genre for Translational Science [O] . Simon W. Rabkin 2015

机译：转化医学的历史传记：转化科学的重要体裁
7. A Framework for Text Processing and Supporting Access to Collections of Digitized Historical Newspapers [O] . Allen Robert B, Copeland Andrea J., Achananuparp Palakorn, 2007

机译：文本处理和支持访问数字化历史报纸收藏的框架
8. Basic Forest Cover Mapping Using Digitized Remote Sensor Data and Automated Data Processing Techniques [R] . Coggeshall, M. E., Hoffer, R. M. 1973

机译：利用数字化遥感数据和自动数据处理技术进行基本森林覆盖制图

Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅