Research on crawling mechanism and policy for crawling product information from mobile internet

Shu Wang; Jia Chen; Chonghuan Xu

首页> 外文期刊>International journal of computing science and mathematics >Research on crawling mechanism and policy for crawling product information from mobile internet

【24h】

Research on crawling mechanism and policy for crawling product information from mobile internet

机译：从移动互联网爬网产品信息的爬网机制和策略研究

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Product information on the mobile internet grows fast in volume and becomes hard in acquisition. Companies tend to deliver product information on their well-tuned mobile websites or websites that is responsive to various mobile devices. Thus, this kind of site is more of a web app than a traditional website, which we call a rich internet application (RIA). With RIAs, information are kept secret from search engine spiders by means of HTML5, Ajax and other scripting techniques in deep web, user interactions are needed to trigger some prescribed events in some certain order to show the whole picture of the information we need. In this paper, we identified the crux of the problem is how to provide the mechanism to parse the scripts and manipulate document object model (DOM) and the policy to trigger user events and run the scrape process. A new mechanism and policy was formulated based on web crawler techniques and studies in Ajax-specified web crawlers. By remodelling web pages redesigning the architecture of web crawler and refining scrape algorithm, we successfully scrape product data from mobile internet RIAs.

机译：移动互联网上的产品信息量迅速增长，并且难以获取。公司倾向于在其经过良好调整的移动网站或响应各种移动设备的网站上提供产品信息。因此，与传统网站（我们称其为富互联网应用程序（RIA））相比，此类网站更像是Web应用程序。使用RIA，可以通过HTML5，Ajax和其他深层Web脚本技术将信息与搜索引擎蜘蛛隔离，需要用户交互以某种特定顺序触发一些规定的事件，以显示我们所需信息的全貌。在本文中，我们确定了问题的症结在于如何提供解析脚本和处理文档对象模型（DOM）的机制以及触发用户事件和运行抓取过程的策略。在Ajax指定的Web爬网程序的基础上，基于Web爬网程序技术和研究制定了新的机制和策略。通过重新构建网页，重新设计Web爬虫的体系结构和完善刮取算法，我们成功地从移动互联网RIA刮取了产品数据。

著录项

来源
《International journal of computing science and mathematics》 |2017年第6期|506-525|共20页
作者
Shu Wang; Jia Chen; Chonghuan Xu;
展开▼
作者单位

School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, China;

Ever Maple Food Science and Technology Co., Ltd, Hongsheng Group, Hangzhou, China;

School of Business Administration, Contemporary Business and Trade Research Center, Contemporary Business and Collaborative Innovation Research Center, Zhejiang Gongshang University, Hangzhou, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
crawler; scrape data; mobile internet; rich internet application; RIA; product information;

机译：爬虫;抓取数据;移动互联网;丰富的互联网应用;RIA;产品信息;

相似文献

外文文献
中文文献
专利

1. The similarity of crawling mechanisms in aquatic and terrestrial gastropods [J] . Pavlova Galina A. Journal of Comparative Physiology, A. Sensory, Neural, and Behavioral Physiology . 2019,第1期

机译：水生填充腹腔桥接机制的相似性
2. Crawling without wiggling: muscular mechanisms and kinematics of rectilinear locomotion in boa constrictors [J] . Newman Steven J., Jayne Bruce C. The Journal of Experimental Biology . 2018,第4期

机译：没有摆动的爬行：蟒蛇机制的肌肉机制和运动运动学
3. Design of a peristaltic crawling robot using 3-D link mechanisms [J] . Norihiko Saga, Satoshi Tesen, Hiroki Dobashi, International Journal of Biomechatronics and Biomedical Robotics . 2013,第2a4期

机译：利用3-D链接机制设计蠕动爬行机器人
4. Steering and Non-steering Crawling Tetrahedral Micro-mechanisms [C] . D. Margineanu, E. C. Lovasz, K. H. Modler, Conference on Microactuators and Micromechanisms . 2015

机译：转向和非转向爬行四面体微机制
5. Model-based Crawling - An Approach to Design Efficient Crawling Strategies for Rich Internet Applications. [D] . Dincturk, Mustafa Emre. 2013

机译：基于模型的爬网-一种为富Internet应用程序设计有效的爬网策略的方法。
6. Tractable near-optimal policies for crawling [O] . Yossi Azar, Eric Horvitz, Eyal Lubetzky, 2018

机译：可抓取的近似最优策略
7. An Effective Fast Searching Algorithm for Internet Crawling Usage [O] . Chia Zhen Hon, Nor Azhar Ahmad 2016

机译：一种有效的互联网爬网快速搜索算法

Research on crawling mechanism and policy for crawling product information from mobile internet

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅