首页> 外文会议>Digital Libraries: Universal and Ubiquitous Access to Information >Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres
【24h】

Automated Processing of Digitized Historical Newspapers: Identification of Segments and Genres

机译:数字化历史报纸的自动化处理:段和类型的标识

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Many historical newspapers are being digitized. We aim to support access to them via text analysis of the OCRd content. However, the OCR includes many errors; so extracting meaningful content from it is difficult. A pipeline of processing steps is proposed. Here, we describe the first two steps: segmentation and genre identification. The segmentation procedure based on headings was quite successful. Genre identification worked well for easily defined genre categories such as weather reports. We also propose additional techniques which may improve the accuracy still farther.
机译:许多历史性报纸正在被数字化。我们旨在通过对OCRd内容的文本分析来支持对它们的访问。但是,OCR包含许多错误。因此很难从中提取有意义的内容。提出了一系列处理步骤。在这里,我们描述了前两个步骤:细分和体裁识别。基于标题的分割过程非常成功。对于容易定义的类型类别(例如天气报告),类型识别非常有效。我们还提出了其他技术,可以进一步提高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号