Web crawler and back-end for news aggregator system (Noox project)

机译：用于新闻聚合器系统的Web爬网程序和后端（Noox项目）

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of this manuscript is to develop a web crawler, Content Management System (CMS), and Application Programming Interface (API) for a news aggregator system dubbed Noox project. News aggregator system requires a crawler to populate its content; however, different website may have a different page layout. This projects aims to create a back-end for Noox and open source and scalable web crawler that is capable to extract data from different page layout. The scope of this project includes the web crawler, CMS, API, scheduler for web crawler, and implementation of socket server. The development process utilizes PHP as the primary back-end language with Laravel as the web application framework. The PHP back-end hosts the CMS and the API. The API itself will implement Representational State Transfer (REST) as its architectural design. JavaScript Object Notation (JSON) is used as means of the API to communicate with the clients. The API also implements JSON Web Token (JWT) for client authentication purpose. Python programming language is used to develop the web crawler. The web crawler utilizes BeautifulSoup as the web extraction utility. The web crawler can be adapted to extract data from different page layout by utilizing user created configuration file. It can also be configured to export the extracted data into a JSON file or database system. The result of this project satisfies the requirement of the Noox project.

机译：该手稿的目的是为被称为Noox项目的新闻聚合系统开发Web搜寻器，内容管理系统（CMS）和应用程序编程接口（API）。新闻聚合器系统要求搜寻器填充其内容；但是，不同的网站可能具有不同的页面布局。该项目旨在为Noox和开放源代码以及可扩展的Web爬网程序创建后端，该后端能够从不同的页面布局中提取数据。该项目的范围包括Web搜寻器，CMS，API，Web搜寻器的调度程序以及套接字服务器的实现。开发过程利用PHP作为主要的后端语言，而Laravel作为Web应用程序框架。 PHP后端托管CMS和API。 API本身将实现代表性状态转移（REST）作为其体系结构设计。 JavaScript对象表示法（JSON）用作API与客户端进行通信的方式。该API还实现JSON Web令牌（JWT）以用于客户端身份验证。 Python编程语言用于开发Web搜寻器。 Web搜寻器将BeautifulSoup用作Web提取实用程序。通过使用用户创建的配置文件，可以将Web搜寻器适配为从不同的页面布局中提取数据。还可以将其配置为将提取的数据导出到JSON文件或数据库系统中。该项目的结果满足了Noox项目的要求。

著录项

来源
《2017 IEEE International Conference on Cybernetics and Computational Intelligence》|2017年|56-61|共6页
会议地点 Phuket(TH)
作者
Raymond Bahana; Rahadian Adinugroho; Ford Lumban Gaol; Agung Trisetyarso; Bahtiar Saleh Abbas; Wayan Suparta;
展开▼
作者单位

Computer Science, Department, BINUS Graduate Program - Doctor of Computer Science, Bina Nusantara University Jakarta, Indonesia 11480;

Computer Science Department, Faculty of Computing and Media, Bina Nusantara University Jakarta, Indonesia 11480;

Computer Science, Department, BINUS Graduate Program - Doctor of Computer Science, Bina Nusantara University Jakarta, Indonesia 11480;

Computer Science, Department, BINUS Graduate Program - Doctor of Computer Science, Bina Nusantara University Jakarta, Indonesia 11480;

Industrial Engineering, Department Faculty of Engineering, Bina Nusantara University Jakarta, Indonesia 11480;

Civil Engineering Department, University of Technology Yogyakarta, Indonesia 55285;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Crawlers; Servers; Databases; Artificial intelligence; Libraries; Data mining; Sockets;

机译：搜寻器;服务器;数据库;人工智能;图书馆;数据挖掘;套接字;;

相似文献

外文文献
中文文献
专利

1. Usability Evaluation of Clinician Web Back-Ends to Telemonitoring Systems: Two Case-Studies in Scotland [J] . Cristina-Adriana ALEXANDRU, Brian McKINSTRY Studies in Informatics and Control . 2012,第2期

机译：临床医师网络后端对远程监控系统的可用性评估：苏格兰的两个案例研究
2. ENTERPRISE APPLICATION INTEGRATION: webMethods partnership fuses on-demand applications with back-end systems [J] . Kevin Parker Manufacturing Business Technology . 2006,第6期

机译：企业应用程序集成：webMethods合作伙伴关系将按需应用程序与后端系统融合在一起
3. PDD Crawler : A Focused Web Crawler Using Link and Content Analysis for Relevence Prediction [J] . Prashant Dahiwale, M M Raghuwanshi, Latesh Malik Computer Science & Information Technology . 2014,第11期

机译：PDD爬网程序：使用链接和内容分析进行相关性预测的集中式Web爬网程序
4. Web crawler and back-end for news aggregator system (Noox project) [C] . Raymond Bahana, Rahadian Adinugroho, Ford Lumban Gaol, IEEE International Conference on Cybernetics and Computational Intelligence . 2017

机译：Web履带和新闻聚合系统的后端（NOOX项目）
5. Using WebQuests, Interactive Websites, Software Programs, and Computer Centered Projects to Enhance Knowledge of Ecosystems and Lessen Incidents of Calling Out and Student Disengagement Among 4th Grade Students [D] . Hetherington, Elaine 2011

机译：使用WebQuests，互动网站，软件程序和计算机为中心的项目，以提高生态系统的知识，并减少4年级学生中的呼唤和学生脱离的事件
6. A user-oriented web crawler for selectively acquiring online content in e-health research [O] . Songhua Xu, Hong-Jun Yoon, Georgia Tourassi -1

机译：面向用户的网络爬虫用于在电子卫生研究中选择性地获取在线内容
7. An Effective Web Ontology Using Web Crawler Systems to Measures Web Similarities [O] . Florence Dayana M, Dr.Chidambaram M 2017

机译：使用Web履带系统的有效网络本体测量Web Idities

Web crawler and back-end for news aggregator system (Noox project)

摘要

著录项

相似文献

相关主题

期刊订阅