Automatic Data Extraction from Lists in Web Pages Based on XML

机译：基于XML的网页列表中的自动数据提取

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes an automatic web information extraction method based on XML. Using the similarity of information structure in the web page template to create the DOM tree, it gets the recording mode of web information automatically by analyzing the PathPattern of the DOM tree. The whole process of this approach is fully automatic, avoiding any sample collection and man-made mark. Besides, some experiments were made to test the approach. It proved that this approach is totally feasible.

机译：提出了一种基于XML的自动Web信息提取方法。利用网页模板中信息结构的相似性创建DOM树，通过分析DOM树的PathPattern自动获取Web信息的记录方式。此方法的整个过程是全自动的，避免了任何样品收集和人为标记。此外，还进行了一些实验来测试该方法。证明了这种方法是完全可行的。

著录项

来源
《International conference on teaching and computational science;WTCS 2009》|2012年|p.915-921|共7页
会议地点
作者
Zhou Xin; Wang Hao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
data extraction; XML; DOM tree; extraction rule; pathpattern;

机译：数据提取; XML; DOM树;提取规则路径模式;

相似文献

外文文献
中文文献
专利

1. Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning [J] . Peng Hao, Li Qiao Intelligent automation and soft computing . 2020,第3期

机译：基于深度学习的Web数据对象自动提取方法研究
2. The network of Shanghai Stroke Service System (4S): A public health-care web-based database using automatic extraction of electronic medical records [J] . Dong Yi, Fang Kun, Wang Xin, International journal of stroke: official journal of the International Stroke Society . 2018,第5期

机译：上海行程服务系统（4S）网络：使用自动提取电子医疗记录的公共医疗保健网络数据库
3. An Automatic Web Data Extraction Approach based on Path Index Trees [J] . Yan Wen, Qingtian Zeng, Hua Duan, International Journal of Performability Engineering . 2018,第10期

机译：基于路径索引树的自动Web数据提取方法
4. Automatic Data Records Extraction from List Page in Deep Web Sources [C] . Asia-Pacific Conference on Information Processing . 2009

机译：自动数据记录从深网络源中的列表页面提取
5. Data extraction from the Web using XML. [D] . Ouahid, Hicham. 2001

机译：使用XML从Web提取数据。
6. Validation of an XML-based process to automatically web-enable clinical practice guidelines: experience with the smoking cessation guideline. [O] . J. Y. Lukoff, R. H. Dolin, C. S. McKinley, 1999

机译：验证基于XML的过程以自动启用网络的临床实践指南：戒烟指南的经验。
7. A framework for automatic generation of web-based data entry applications based on XML [O] . Volker Turau 2002

机译：一个基于XML自动生成基于Web的数据输入应用程序的框架

Automatic Data Extraction from Lists in Web Pages Based on XML

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅