A Crawler-Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles

机译：基于履带的报纸刮除和反向搜索所需文章的履带式方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

How often does it happen, that we cannot get enough information from a newspaper. Often an article mentions a name we have not heard before or simply does not shed enough light on the news and its details. Online newspapers even have a problem of webpage noise. Every article is filled with HTML, Meta tags, JavaScript, and whatnot. This paper provides a fast and efficient approach to scraping a newspaper to get any desired article without the noise and reverse search the same topic on Google to get a list of the most relevant information regarding that article. The algorithm supports ten languages and works with the best newspapers like CNN and BBC.

机译：它经常发生一次，我们无法从报纸上获得足够的信息。通常是一篇文章提到了我们之前没有听过的名字或者只是在新闻和细节上没有足够的光线。在线报纸甚至存在网页噪音的问题。每篇文章都填充了HTML，元标签，JavaScript和Whatnot。本文提供了一种快速有效的方法来刮报纸，以获得任何所需的文章，没有噪音，反向搜索谷歌上的相同主题，以获取有关该文章的最相关信息的列表。该算法支持十种语言，并与CNN和BBC等最好的报纸合作。

著录项

来源
《International Conference on Frontiers of Intelligent Computing : Theory and Applications》|2018年|xx 583 pages :|共9页
会议地点
作者
Ankit Aich; Amit Dutta; Aruna Chakraborty;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.4-532;
关键词
Reverse searching; Parsing; Crawling; Newspaper;

机译：反向搜索;解析;爬行;报纸;

相似文献

外文文献
中文文献
专利

1. Data Analysis and Visualization of Newspaper Articles on Thirdhand Smoke: A Topic Modeling Approach [J] . Qian Liu, Qiuyi Chen, Jiayi Shen, JMIR Medical Informatics . 2019,第1期

机译：二手烟报纸文章的数据分析和可视化：主题建模方法
2. An alternative approach for statistical single-label document classification of newspaper articles [J] . Georgios Mamakis, Athanasios G. Malamos, J.Andrew Ware Journal of Information Science . 2011,第3期

机译：报纸文章的统计单标签文档分类的另一种方法
3. Europeana Newspapers: searching digitized historical newspapers from 23 European countries [J] . Marieke Willems, Rossitza Atanassova Insights . 2015,第1期

机译：欧洲报纸：搜索来自23个欧洲国家的数字化历史报纸
4. A Crawler-Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles [C] . Ankit Aich, Amit Dutta, Aruna Chakraborty International Conference on Frontiers of Intelligent Computing : Theory and Applications . 2018

机译：基于履带的报纸刮报和反向搜查所需文章的履带式方法
5. Dredging in the Kansas River: An innovative approach to a newspaper article. [D] . Lynn, Stephen T. 2007

机译：堪萨斯河中的疏:：一种针对报纸文章的创新方法。
6. An Exploratory Study of Health Inequality Discourse Using Korean Newspaper Articles: A Topic Modeling Approach [O] . Jin-Hwan Kim 2019

机译：韩国报纸文章对健康不平等话语的探索性研究：主题建模方法
7. The The Conceptualization of COVID-19 in English and Kurdish Online Newspaper Articles: A Cognitive Semantic Approach [O] . 2020

机译：Covid-19在英语和库尔德在线报纸文章中的概念化：一种认知语义方法

A Crawler-Parser-Based Approach to Newspaper Scraping and Reverse Searching of Desired Articles

摘要

著录项

相似文献

相关主题

期刊订阅