首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >ViDE: A Vision-Based Approach for Deep Web Data Extraction
【24h】

ViDE: A Vision-Based Approach for Deep Web Data Extraction

机译:ViDE:一种基于视觉的深度Web数据提取方法

获取原文
获取原文并翻译 | 示例
       

摘要

Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages (they will be called deep Web pages in this paper). Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page-programming-language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to produce perfect extraction. Our experiments on a large set of Web databases show that the proposed vision-based approach is highly effective for deep Web data extraction.
机译:通过提交到Web数据库的查询访问深度Web内容,并将返回的数据记录封装在动态生成的Web页面中(在本文中将其称为深度Web页面)。由于深层网页的底层复杂结构,因此从深层网页中提取结构化数据是一个具有挑战性的问题。到现在为止,已经提出了许多技术来解决这个问题,但是由于它们依赖于Web页面编程语言,因此它们都有固有的局限性。作为流行的二维媒体,网页上的内容始终定期显示,以供用户浏览。这促使我们寻求一种不同的方式来进行深层Web数据提取,以通过利用深层Web页面上的一些有趣的常见视觉功能来克服先前作品的局限性。在本文中,提出了一种新颖的基于视觉的方法,该方法独立于网页编程语言。此方法主要利用深层Web页面上的视觉功能来实现深层Web数据提取,包括数据记录提取和数据项提取。我们还提出了一个新的评估措施修订版,以捕获实现完美提取所需的人工量。我们在大量Web数据库上的实验表明,所提出的基于视觉的方法对于深度Web数据提取非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号