ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites

机译：ROADRUNNER：致力于从大型网站自动提取数据

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The paper investigates techniques for extracting data from HTML sites through, the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.

机译：本文研究了使用自动生成的包装程序通过HTML网站提取数据的技术。为了使包装器的生成和数据提取过程自动化，本文开发了一种新颖的技术来比较HTML页面并根据它们的异同来生成包装器。在现实生活中的数据密集型网站上的实验结果证实了该方法的可行性。

著录项

来源
《Twenty-Seventh International Conference on Very Large Data Bases, 27th, Sep 11-14th, 2001, Roma, Italy》|2001年|p.109-118|共10页
会议地点 Roma(IT);Roma(IT)
作者
Valter Crescenzi; Giansalvatore Mecca; Paolo Merialdo;
展开▼
作者单位

Universita di Roma Tre;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Extraction of Web Site Evaluation Criteria and Automatic Evaluation [J] . Peng Li, Seiji Yamada Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2010,第4a76期

机译：网站评估标准的提取与自动评估
2. Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning [J] . Peng Hao, Li Qiao Intelligent automation and soft computing . 2020,第3期

机译：基于深度学习的Web数据对象自动提取方法研究
3. The network of Shanghai Stroke Service System (4S): A public health-care web-based database using automatic extraction of electronic medical records [J] . Dong Yi, Fang Kun, Wang Xin, International journal of stroke: official journal of the International Stroke Society . 2018,第5期

机译：上海行程服务系统（4S）网络：使用自动提取电子医疗记录的公共医疗保健网络数据库
4. ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites [C] . Paolo Merialdo, Giansalvatore Mecca, Valter Crescenzi International conference on very large data bases . 2001

机译：Roadrunner：朝大型网站的自动数据提取
5. Automatically constructing wrappers for effective and efficient Web information extraction. [D] . Mundluru, Dheerendranath. 2008

机译：自动构造包装器，以高效有效地提取Web信息。
6. Automatic Extraction of ICD-O-3 Primary Sites from Cancer Pathology Reports [O] . Ramakanth Kavuluru, Isaac Hands, Eric B. Durbin, 2013

机译：从癌症病理报告中自动提取ICD-O-3原发部位
7. Automatic web content extraction for generating tag clouds from Thai web sites [O] . Thanadechteemapat W., Fung C.C. 2011

机译：自动提取Web内容以从泰国网站生成标签云

ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅