首页> 外文期刊>International journal of computing science and mathematics >Research on crawling mechanism and policy for crawling product information from mobile internet
【24h】

Research on crawling mechanism and policy for crawling product information from mobile internet

机译:从移动互联网爬网产品信息的爬网机制和策略研究

获取原文
获取原文并翻译 | 示例

摘要

Product information on the mobile internet grows fast in volume and becomes hard in acquisition. Companies tend to deliver product information on their well-tuned mobile websites or websites that is responsive to various mobile devices. Thus, this kind of site is more of a web app than a traditional website, which we call a rich internet application (RIA). With RIAs, information are kept secret from search engine spiders by means of HTML5, Ajax and other scripting techniques in deep web, user interactions are needed to trigger some prescribed events in some certain order to show the whole picture of the information we need. In this paper, we identified the crux of the problem is how to provide the mechanism to parse the scripts and manipulate document object model (DOM) and the policy to trigger user events and run the scrape process. A new mechanism and policy was formulated based on web crawler techniques and studies in Ajax-specified web crawlers. By remodelling web pages redesigning the architecture of web crawler and refining scrape algorithm, we successfully scrape product data from mobile internet RIAs.
机译:移动互联网上的产品信息量迅速增长,并且难以获取。公司倾向于在其经过良好调整的移动网站或响应各种移动设备的网站上提供产品信息。因此,与传统网站(我们称其为富互联网应用程序(RIA))相比,此类网站更像是Web应用程序。使用RIA,可以通过HTML5,Ajax和其他深层Web脚本技术将信息与搜索引擎蜘蛛隔离,需要用户交互以某种特定顺序触发一些规定的事件,以显示我们所需信息的全貌。在本文中,我们确定了问题的症结在于如何提供解析脚本和处理文档对象模型(DOM)的机制以及触发用户事件和运行抓取过程的策略。在Ajax指定的Web爬网程序的基础上,基于Web爬网程序技术和研究制定了新的机制和策略。通过重新构建网页,重新设计Web爬虫的体系结构和完善刮取算法,我们成功地从移动互联网RIA刮取了产品数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号