首页> 外文会议>The 5th International Conference on Pervasive Computing and Applications >An approach based on extracted data for wrapper maintenance
【24h】

An approach based on extracted data for wrapper maintenance

机译:一种基于提取数据的包装维护方法

获取原文

摘要

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel approach to the problem of automatic wrapper maintenance. It is based on the truth that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and content of the extracted data items. The approach uses these preserved features to identify the locations of the desired values in the changed pages, then the wrappers can be repaired. The experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with accuracies.
机译:使用包装器从网页中提取数据是一个具有广泛实践兴趣的应用程序中出现的一个基本问题。与Web数据提取有关的两个主要问题是包装器生成和包装器维护。在本文中,我们提出了一种解决自动包装维护问题的新颖方法。基于这样的事实,尽管进行了各种页面更改,但页面的许多重要功能仍得以保留,例如语法模式,注释和提取的数据项的内容。该方法使用这些保留的功能来标识更改页面中所需值的位置,然后可以修复包装器。在真实网站上的实验表明,该方法可以有效地维护包装器,以准确地提取所需的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号