首页> 外文会议>International conference on web engineering >Revisiting Web Data Extraction Using In-Browser Structural Analysis and Visual Cues in Modern Web Designs
【24h】

Revisiting Web Data Extraction Using In-Browser Structural Analysis and Visual Cues in Modern Web Designs

机译:在现代Web设计中使用浏览器内结构分析和视觉提示重新访问Web数据提取

获取原文

摘要

Recent trends in website design have an impact on methods used for web data extraction. Many existing methods rely on structural analysis of web pages and, with the introduction of CSS, table-based layouts are no longer used, while responsive design means that layout and presentation are dependent on browsing context which also makes the use of visual clues more complex. We present DeepDesign, a system that semi-automatically extracts data records from web pages based on a combination of structural and visual features. It runs in a general-purpose browser, taking advantage of direct access to the complete CSS3 spectrum and the capability to trigger and execute JavaScript in the page. The user sees record matching in real-time and dynamically adapts the process if required. We present the details of the matching algorithms and provide an evaluation of them based on the top ten Alexa websites.
机译:网站设计的最新趋势对用于Web数据提取的方法有影响。许多现有方法都依赖于网页的结构分析,并且随着CSS的引入,不再使用基于表的布局,而响应式设计意味着布局和表示依赖于浏览上下文,这也使得视觉提示的使用变得更加复杂。 。我们介绍了DeepDesign,这是一个根据结构和视觉特征的组合从网页半自动提取数据记录的系统。它在通用浏览器中运行,利用对完整CSS3频谱的直接访问以及触发和执行页面中JavaScript的功能。用户可以实时查看记录匹配,并根据需要动态调整过程。我们提供了匹配算法的详细信息,并根据排名前十的Alexa网站对它们进行了评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号