Extracting structured data from Web pages (Poster)

机译：从网页中提取结构化数据（海报）

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many Web sites contain a large collection of "structured" Web pages. These pages encode data from an underlying structured source, and are typically generated dynamically. Our goal is to automatically extract structured data from a collection of pages described above, without any human input like manually generated rules or training sets. Extracting structured data gives us greater querying power over the data and is useful in information integration systems. Our approach consists of two stages. In the first stage, the unknown template used to create the pages is deduced. In the second stage, the deduced template is used to extract the values. We focus on the first stage since it is more challenging. The full version contains formal definition of high occurrence correlation and our algorithm. We evaluated our approach by considering 9 real collections of pages.

机译：许多网站包含大量的“结构化”网页。这些页面对来自底层结构化源的数据进行编码，并且通常是动态生成的。我们的目标是从上述页面集合中自动提取结构化数据，而无需人工输入规则或训练集等任何人工输入。提取结构化数据使我们对数据具有更大的查询能力，并且在信息集成系统中很有用。我们的方法包括两个阶段。在第一阶段，推导用于创建页面的未知模板。在第二阶段，使用推导的模板提取值。我们将重点放在第一阶段，因为它更具挑战性。完整版包含高相关性的正式定义和我们的算法。我们通过考虑9个页面的真实集合来评估我们的方法。

著录项

来源
《Knowledge-Based Systems for Safety Critical Applications》|1994年|p.698-698|共1页
会议地点
作者
Arvind Arasu; Garcia-Molina H.;
展开▼
作者单位

Stanford Univ., CA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A Pure Visual Approach for Automatically Extracting and Aligning Structured Web Data [J] . Estuka Fadwa, Miller James ACM Transactions on Internet Technology . 2019,第4期

机译：一种自动提取和对齐结构化Web数据的纯粹视觉方法
2. Extracting lists of data records from semi-structured web pages [J] . Manuel Alvarez, Alberto Pan, Juan Raposo, Data & Knowledge Engineering . 2008,第2期

机译：从半结构化网页中提取数据记录列表
3. POSTER: Two Concurrent Data Structures for Efficient Datalog Query Processing [J] . Herbert Jordan, Bernhard Scholz, Pavle Subotic ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第1期

机译：海报：两个并发数据结构，用于高效数据录查询处理
4. Extracting structured data from Web pages (Poster) [C] . Arvind Arasu, Garcia-Molina, H. . 2003

机译：从网页中提取结构化数据（海报）
5. Extracting and managing structured web data. [D] . Cafarella, Michael John. 2009

机译：提取和管理结构化的Web数据。
6. Software Agents for Extracting Aggregating and Updating Data from Web Pages of Genomic Databanks [O] . Andrea Stella, Marco Masseroli, Myriam Alcalay, 2002

机译：用于从基因组数据库的网页中提取汇总和更新数据的软件代理
7. Intelligent assistant for extracting semi-structured web data [O] . Adžič Nik 2016

机译：用于提取半结构化Web数据的智能助手

Extracting structured data from Web pages (Poster)

摘要

著录项

相似文献

相关主题

期刊订阅