【24h】

Extracting Structured Data from Ajax Site

机译:从AJAX站点提取结构化数据

获取原文

摘要

Ajax is an important approach for improving rich interactivity between Web server and end users during Web 2.0 eras. At the same time, the structured data in AJAX Web pages can not be extracted easily due to its asynchronous loading. In this paper, we propose a technique for extracting the structured data from the AJAX based Web pages. Firstly, an AjaxFetcher component is created to fetch the dynamic page content by using an embedded browser. Secondly, two different strategies are used to extract the structured data from the obtained page contents. Especially for the page that contains multi-records, an automatic approach to determine each possible record is proposed. Experimental results show that fetching Ajax pages and extracting the structured data from them is feasible.
机译:Ajax是在Web 2.0时代内改善Web服务器和最终用户之间丰富交互性的重要方法。同时,由于其异步加载,无法轻易提取Ajax网页中的结构化数据。在本文中,我们提出了一种用于从基于AJAX的网页提取结构化数据的技术。首先,创建一个AjaxFetcher组件以通过使用嵌入式浏览器来获取动态页面内容。其次,两种不同的策略用于从所获得的页面内容中提取结构化数据。特别是对于包含多记录的页面,提出了一种确定每个可能记录的自动方法。实验结果表明,获取Ajax页面并从它们中提取结构化数据是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号