Extraction of Distinctive Keywords and Articles from Untranscribed Historical Newspaper Images

机译：从未转录的历史报纸图像中提取独特的关键词和文章

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel approach to extract distinctive keywords from historical newspaper images without using character recognition. We converted an image of the text block on an entire newspaper page into a sequence of codes based on discretization of the feature vectors, an approach that eliminated the errors introduced by optical character recognition (OCR). This conversion makes it possible to analyze untranscribed newspaper images by using text-processing methods. We examined the daily occurrence of every tri-gram string, and extracted strings with a dense appearance as distinctive keywords. In addition, we highlighted articles that contain distinctive keywords as distinctive articles. The proposed method was evaluated on an archive of Japanese newspaper images published in the 19th century, and the results were promising.

机译：本文提出了一种新颖的方法，可以在不使用字符识别的情况下从历史报纸图像中提取独特的关键词。我们基于特征向量的离散化将整个报纸页面上的文本块图像转换为代码序列，该方法消除了光学字符识别（OCR）引入的错误。这种转换使得可以通过使用文本处理方法来分析未转录的报纸图像。我们检查了每个三字母组字符串的日常出现情况，并提取了外观密集的字符串作为独特的关键字。此外，我们将包含独特关键字的文章突出显示为独特文章。在19世纪出版的日本报纸图像档案中对提出的方法进行了评估，结果令人鼓舞。

著录项

来源
《International Workshop on Advanced Imaging Technology》|2020年|115151K.1-115151K.6|共6页
会议地点
作者
Sora Ito; Kengo Terasawa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
historical documents; historical newspapers; keyword extraction;

机译：历史文件;历史报纸;关键字提取;

相似文献

外文文献
中文文献
专利

1. A keyword retrieval system for historical Mongolian document images [J] . Hongxi Wei, Guanglai Gao International Journal on Document Analysis and Recognition . 2014,第1期

机译：蒙古文历史文献图像关键词检索系统
2. Efficient Keyword Extraction and Text Summarization for Reading Articles on Smart Phone [J] . Computing and informatics . 2015,第4期

机译：在智能手机上阅读文章的有效关键字提取和文本摘要
3. EFFICIENT KEYWORD EXTRACTION AND TEXT SUMMARIZATION FOR READING ARTICLES ON SMART PHONE [J] . Jeong Hyoungil, Ko Youngjoong, Seo Jungyun Computing and informatics . 2015,第4期

机译：在智能手机上阅读文章的有效关键词提取和文本摘要
4. Keyword extraction of radio news using term weighting with an encyclopedia and newspaper articles [C] . Fumiyo Fukumoto, Yoshihiro Sekiguchi, Yoshimi Suzuki Annual international ACM SIGIR conference on Research and development in information retrieval;International ACM SIGIR conference on Research and development in information retrieval . 1998

机译：使用术语权重结合百科全书和报纸文章来提取广播新闻的关键字
5. Keywords at Work: Investigating Keyword Extraction in Social Media Applications [D] . Lahiri, Shibamouli. 2018

机译：工作中的关键字：调查社交媒体应用程序中的关键字提取
6. Information extraction from full text scientific articles: Where are the keywords? [O] . Parantu K Shah, Carolina Perez-Iratxeta, Peer Bork, 2003

机译：从全文科学文章中提取信息：关键字在哪里？
7. Automatic Keyword Extraction from Historical Document Images [O] . 2006

机译：从历史文档图像中自动提取关键字

Extraction of Distinctive Keywords and Articles from Untranscribed Historical Newspaper Images

摘要

著录项

相似文献

相关主题

期刊订阅