Research of Extracting Data from HTML Web Pages Automatically

王茹; 宋瀚涛; 陆玉昌

首页> 中文期刊> 《北京理工大学学报：英文版》 >Research of Extracting Data from HTML Web Pages Automatically

Research of Extracting Data from HTML Web Pages Automatically

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to use data information in the Internet,it is necessary to extract data from web pages.An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generationalgorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate thewrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate thewrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.

著录项

来源
《北京理工大学学报：英文版》 |2003年第s1期|104-108|共5页
作者
王茹; 宋瀚涛; 陆玉昌;
展开▼
作者单位

Department of Computer Science and Engineering;

School of Information Science and Technology;

Beijing Institute of Technology;

Beijing 100081;

China;

State Key Laboratory of Intelligent Technology and System;

Tsinghua University;

Beijing 100084;

China;

展开▼
原文格式 PDF
正文语种 chi
中图分类 TP393.092;
关键词
information; extraction; data; transformation; wrapper; HTML; page;

相似文献

中文文献
外文文献
专利

1. Automatic Web-based relational data imputation [J] . Hailong LIU ,Zhanhuai LI ,Qun CHEN . 中国高等学校学术文摘·计算机科学 . 2018,第006期
2. Mapping to Cells A simple method to extract traffic dynamics from probe vehicle data [C] . Zhengbing He ,贺正冰 . 第四届交通科学与计算专题研讨会 . 2017
3. Statistical Analysis of Extracted data from video web site by Using web crawler [A] . Md Khalid Hossen . 2018

Research of Extracting Data from HTML Web Pages Automatically

摘要

著录项

相似文献

相关主题

期刊订阅