首页> 外文会议>2014 International Conference on Electronic Systems, Signal Processing, and Computing Technologies >Using Visual Clues Concept for Extracting Main Data from Deep Web Pages
【24h】

Using Visual Clues Concept for Extracting Main Data from Deep Web Pages

机译:使用Visual Clues概念从深层网页提取主数据

获取原文
获取原文并翻译 | 示例

摘要

Extracting data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. A large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language-dependent. The contents on Web pages are always displayed regularly for users to browse. There is different ways for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper vision-based approach is web page programming-language-independent approach is proposed. This approach utilizes the visual features of the web pages to extract data from deep web pages including data record extraction and data item extraction. Again we also propose a new evaluation measure revision to capture human effort needed to produce exact extraction of data. Our implementation on large set of web databases describes the proposed vision-based approach is highly effective for data extraction from deep web pages.
机译:由于深层网页的底层复杂结构,因此从深层网页中提取数据是一个具有挑战性的问题。已经提出了许多技术来解决这个问题,但是由于它们是与网页编程语言相关的,所以它们都具有固有的局限性。网页上的内容始终定期显示,以供用户浏览。通过利用深层Web页面上的一些有趣的通用视觉功能,深层Web数据提取有多种方法可以克服先前工作的局限性。本文提出了一种基于视觉的方法,即网页编程与语言无关的方法。这种方法利用网页的视觉特征从深层网页中提取数据,包括数据记录提取和数据项提取。同样,我们还提出了一种新的评估方法修订版,以捕获为准确提取数据所需的人工。我们在大型Web数据库上的实现描述了所提出的基于视觉的方法对于从深层网页提取数据非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号