An approach based on extracted data for wrapper maintenance

机译：一种基于提取数据的包装维护方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel approach to the problem of automatic wrapper maintenance. It is based on the truth that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and content of the extracted data items. The approach uses these preserved features to identify the locations of the desired values in the changed pages, then the wrappers can be repaired. The experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with accuracies.

机译：使用包装器从网页中提取数据是一个具有广泛实践兴趣的应用程序中出现的一个基本问题。与Web数据提取有关的两个主要问题是包装器生成和包装器维护。在本文中，我们提出了一种解决自动包装维护问题的新颖方法。基于这样的事实，尽管进行了各种页面更改，但页面的许多重要功能仍得以保留，例如语法模式，注释和提取的数据项的内容。该方法使用这些保留的功能来标识更改页面中所需值的位置，然后可以修复包装器。在真实网站上的实验表明，该方法可以有效地维护包装器，以准确地提取所需的数据。

著录项

来源
《The 5th International Conference on Pervasive Computing and Applications》|2010年|p.88-92|共5页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类算法理论;
关键词
Web data extraction; Web data integration; Wrapper Maintenance;

机译：Web数据提取; Web数据集成;包装维护;

相似文献

外文文献
中文文献
专利

1. A novel filter-wrapper hybrid gene selection approach for microarray data based on multi-objective forest optimization algorithm [J] . Nouri-Moghaddam N., Ghazanfari M., Fathian M. Decision Science Letters . 2020,第3期

机译：基于多目标森林优化算法的微阵列数据新型滤皮包装杂交基因选择方法
2. Wrapper Maintenance: A Machine Learning Approach [J] . Knoblock C. A., Lerman K., Minton S. N. The Journal of Artificial Intelligence Research . 2003,第10期

机译：包装机维护：一种机器学习方法
3. Wrapper Maintenance: A Machine Learning Approach [J] . Kristina Lerman, Steven N. Minton, Craig A. Knoblock The Journal of Artificial Intelligence Research . 2003,第0期

机译：包装机维护：一种机器学习方法
4. An approach based on extracted data for wrapper maintenance [C] . {missing} International Conference on Pervasive Computing and Applications . 2010

机译：一种基于提取数据包装维护的方法
5. Data content mining: Extracting and cataloging content-based metadata from satellite images (remote sensing). [D] . Harberts, Robert Lawrence. 1996

机译：数据内容挖掘：从卫星图像中提取和分类基于内容的元数据（遥感）。
6. Extracting Value from Industrial Alarms and Events: A Data-Driven Approach Based on Exploratory Data Analysis [O] . Aguinaldo Bezerra, Ivanovitch Silva, Luiz Affonso Guedes, 2019

机译：从工业警报和事件中提取价值：一种基于探索性数据分析的数据驱动方法
7. Wrapper Maintenance: A Machine Learning Approach [O] . Knoblock, C. A., Lerman, K., Minton, S. N. 2011

机译：包装器维护：机器学习方法
8. TRANSPORTATION-RELATED DATA BASES EXTRACTED FROM THE NATIONAL INDEX OF ENERGY AND ENVIRONMENTAL DATA BASES: PART 1, DIGEST OF DETAILED DATA BASE DESCRIPTIONS [R] . E. W. Birss, J. W. Yeh 1976

机译：从能源和环境数据库国家指数中提取的与运输有关的数据库：第1部分，详细数据基础说明摘要

An approach based on extracted data for wrapper maintenance

摘要

著录项

相似文献

相关主题

期刊订阅