Schema-Guided Wrapper Maintenance for Web-Data Extraction

机译：Web数据提取的架构指导包装维护

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interests. There are two main issues relevant to Web-data extraction, namely wrapper generation and wrapper maintenance. In this paper, we propose a novel schema-guided approach to the problem of automatic wrapper maintenance. It is based on the observation that despite various page changes, many important features of the pages are preserved, such as syntactic patterns, annotations, and hyperlinks of the extracted data items. Our approach uses these preserved features to identify the locations of the desired values in the changed pages, and repair wrappers correspondingly by inducing semantic blocks from the HTML tree. Our intensive experiments on real Web sites show that the proposed approach can effectively maintain wrappers to extract desired data with high accuracies.

机译：使用包装器从网页中提取数据是一个具有广泛实践兴趣的应用程序中出现的一个基本问题。与Web数据提取有关的两个主要问题是包装器生成和包装器维护。在本文中，我们针对自动包装维护的问题提出了一种新的模式指导方法。基于这样的观察，尽管进行了各种页面更改，但是页面的许多重要功能仍然保留，例如语法模式，注释和提取的数据项的超链接。我们的方法使用这些保留的功能来标识更改页面中所需值的位置，并通过从HTML树中引入语义块来相应地修复包装器。我们在真实网站上的密集实验表明，该方法可以有效地维护包装器，以高精度地提取所需数据。

著录项

来源
《ACM(Association for Computing Machinery) International Workshop on Web Information and Data Management(WIDM 2003); 20031107-20031108; New Orleans,LA; US》|2003年|P.1-8|共8页
会议地点 New Orleans LA(US);New Orleans LA(US);New Orleans LA(US);New Orleans LA(US)
作者
Xiaofeng Meng; Dongdong Hu; Chen Li;
展开▼
作者单位

School of Information Renmin University of China Beijing 100872, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
web; extraction; wrapper; maintenance; schema;

机译：网络;提取;包装;维护;模式;

相似文献

外文文献
中文文献
专利

1. Volunteering for Linked Data Wrapper maintenance: A platform perspective [J] . Azpeitia Iker, Iturrioz Jon, Diaz Oscar Information Systems . 2020,第Mara期

机译：链接数据包装器维护的志愿服务：平台角度
2. Wrapper Maintenance: A Machine Learning Approach [J] . Knoblock C. A., Lerman K., Minton S. N. The Journal of Artificial Intelligence Research . 2003,第10期

机译：包装机维护：一种机器学习方法
3. Wrapper Maintenance: A Machine Learning Approach [J] . Kristina Lerman, Steven N. Minton, Craig A. Knoblock The Journal of Artificial Intelligence Research . 2003,第0期

机译：包装机维护：一种机器学习方法
4. Schema-Guided Wrapper Maintenance for Web-Data Extraction [C] . Xiaofeng Meng, Dongdong Hu, Chen Li Association for Computing Machinery International Workshop on Web Information and Data Management . 2003

机译：网络数据提取的架构引导包装维护
5. Scalable Detection and Extraction of Data in Lists in OCRed Text for Ontology Population Using Semi-Supervised and Unsupervised Active Wrapper Induction. [D] . Packer, Thomas L. 2014

机译：使用半监督和无监督主动包装诱导，可扩展地检测和提取OCRed文本中本体列表中的数据。
6. Indian scorpions collected in Karnataka: maintenance in captivity venom extraction and toxicity studies [O] . Santhosh Kambaiah Nagaraj, Pavana Dattatreya, Thippeswamy Nayaka Boramuthi 2015

机译：卡纳塔克邦采集的印度蝎子：人工饲养毒液提取和毒性研究的维持
7. Schema-Guided Wrapper Maintenance for Web-Data Extraction [O] . Xiaofeng Meng, Dongdong Hu, Chen Li 2003

机译：用于Web数据提取的模式引导包装器维护

Schema-Guided Wrapper Maintenance for Web-Data Extraction

摘要

著录项

相似文献

相关主题

期刊订阅