首页> 外文会议>2017 IEEE International Conference on Cybernetics and Computational Intelligence >Web crawler and back-end for news aggregator system (Noox project)
【24h】

Web crawler and back-end for news aggregator system (Noox project)

机译:用于新闻聚合器系统的Web爬网程序和后端(Noox项目)

获取原文
获取原文并翻译 | 示例

摘要

The aim of this manuscript is to develop a web crawler, Content Management System (CMS), and Application Programming Interface (API) for a news aggregator system dubbed Noox project. News aggregator system requires a crawler to populate its content; however, different website may have a different page layout. This projects aims to create a back-end for Noox and open source and scalable web crawler that is capable to extract data from different page layout. The scope of this project includes the web crawler, CMS, API, scheduler for web crawler, and implementation of socket server. The development process utilizes PHP as the primary back-end language with Laravel as the web application framework. The PHP back-end hosts the CMS and the API. The API itself will implement Representational State Transfer (REST) as its architectural design. JavaScript Object Notation (JSON) is used as means of the API to communicate with the clients. The API also implements JSON Web Token (JWT) for client authentication purpose. Python programming language is used to develop the web crawler. The web crawler utilizes BeautifulSoup as the web extraction utility. The web crawler can be adapted to extract data from different page layout by utilizing user created configuration file. It can also be configured to export the extracted data into a JSON file or database system. The result of this project satisfies the requirement of the Noox project.
机译:该手稿的目的是为被称为Noox项目的新闻聚合系统开发Web搜寻器,内容管理系统(CMS)和应用程序编程接口(API)。新闻聚合器系统要求搜寻器填充其内容;但是,不同的网站可能具有不同的页面布局。该项目旨在为Noox和开放源代码以及可扩展的Web爬网程序创建后端,该后端能够从不同的页面布局中提取数据。该项目的范围包括Web搜寻器,CMS,API,Web搜寻器的调度程序以及套接字服务器的实现。开发过程利用PHP作为主要的后端语言,而Laravel作为Web应用程序框架。 PHP后端托管CMS和API。 API本身将实现代表性状态转移(REST)作为其体系结构设计。 JavaScript对象表示法(JSON)用作API与客户端进行通信的方式。该API还实现JSON Web令牌(JWT)以用于客户端身份验证。 Python编程语言用于开发Web搜寻器。 Web搜寻器将BeautifulSoup用作Web提取实用程序。通过使用用户创建的配置文件,可以将Web搜寻器适配为从不同的页面布局中提取数据。还可以将其配置为将提取的数据导出到JSON文件或数据库系统中。该项目的结果满足了Noox项目的要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号