Automatic Generation of Wrapper for Data Extraction from the Web

机译：自动生成包装器的包装纸

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the development of the Internet, the Web has become invaluable information source. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. In this paper, we propose a data extraction approach based on extracting schema, which generates automatically a wrapper to extract data from an HTML document, and produces an XML document conforming to given DTD. After the user defines extraction data schema in the form of DTD, the wrapper is generated automatically with the induction and leaning algorithm. The experiment indicates that the approach can correctly extract the required data from the source document with high accuracy.

机译：随着互联网的发展，Web已经成为无价的信息源。为了使用此信息超过人类浏览，HTML中的网页必须转换为有意义的软件程序的格式。包装器是将HTML文档转换为语义有意义的XML文件的有用技术。在本文中，我们提出了一种基于提取模式的数据提取方法，它自动生成包装器以从HTML文档中提取数据，并产生符合给定DTD的XML文档。在用户以DTD的形式定义提取数据模式之后，将自动使用感应和倾斜算法生成包装器。实验表明该方法可以高精度地正确地从源文档中提取所需数据。

著录项

来源
《International Conference on Web Engineering》|2003年||共5页
会议地点
作者
Suzhi Zhang; Zhengding Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393.4-532;
关键词

相似文献

外文文献
中文文献
专利

1. Automatic Annotation Wrapper Generation and Mining Web Database Search Result [J] . V.Yogam, K.Umamaheswari International Journal of Innovative Research in Science, Engineering and Technology . 2014,第3期

机译：自动注释包装器生成和挖掘Web数据库搜索结果
2. Entropy-based automated wrapper generation for weblog data extraction [J] . George Gkotsis, Karen Stepanyan, Alexandra I. Cristea, World Wide Web . 2014,第4期

机译：基于熵的自动包装器生成，用于Weblog数据提取
3. Automatic wrapper generation and generalization for social media websites [J] . Bartosz Bazinski, Michal Brzezicki Control and Cybernetics . 2012,第4期

机译：社交媒体网站的自动包装器生成和泛化
4. Wrapper Generation for Automatic Data Extraction from Large Web Sites [C] . Nitin Jindal International Workshop on Databases in Networked Information Systems . 2005

机译：从大网站自动数据提取的包装器
5. Automatically constructing wrappers for effective and efficient Web information extraction. [D] . Mundluru, Dheerendranath. 2008

机译：自动构造包装器，以高效有效地提取Web信息。
6. SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data [O] . Peng Zhang, Bertrand Boisson, Peter D Stenson, 2019

机译：SeqTailor：用户友好的网络服务器用于从下一代测序数据中提取DNA或蛋白质序列
7. Entropy-based automated wrapper generation for weblog data extraction [O] . Gkotsis, George, Stepanyan, Karen, Cristea, Alexandra I., 2014

机译：基于熵的自动包装器生成，用于Weblog数据提取

Automatic Generation of Wrapper for Data Extraction from the Web

摘要

著录项

相似文献

相关主题

期刊订阅