Automatic data extraction of websites using data path matching and alignment

机译：使用数据路径匹配和对齐方式自动数据提取网站

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Since most of web pages contain their main information in data records, extracting data records enables one to obtain and integrate data from diverse sources of Internet. Therefore, data extraction of web pages has been a popular research issue in the last decade. The paper aims to automatically extract data records from web pages and identify items from those extracted records. The proposed approach utilizes Data Path Matching to effectively extract data records and Data Path Code Alignment to efficiently identify data items. Experimental results reveal that the method can extract data effectively.

机译：由于大多数网页都包含数据记录中的主要信息，因此提取数据记录使得能够从不同的Internet来源获取和集成数据。因此，网页的数据提取是过去十年中的流行研究问题。本文旨在自动从网页中提取数据记录并从提取的记录中识别项目。所提出的方法利用数据路径匹配，以有效地提取数据记录和数据路径代码对齐，以有效地识别数据项。实验结果表明，该方法可以有效提取数据。

著录项

来源
《International Conference on Digital Information Processing and Communications》|2015年||共5页
会议地点
作者
Yu-Chun Chu; Chiun-Chieh Hsu; Chen-Jhe Lee; Yu-Ting Tsai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
Web sites; data integration; data mining; information retrieval; pattern matching; Internet; Web mining; Web pages; Websites; automatic data extraction; data integration; data items identification; data path code alignment; data path matching; data records extraction; Data mining; HTML; Information filters; Visualization; Web pages; Yttrium; DOM (Document Object Model); Data Path; Web Mining; Web data extraction;

机译：网站;数据集成;数据挖掘;信息检索;模式匹配;网页;网页;网站;数据集成;数据集成;数据路径代码对齐;数据记录提取;数据挖掘;HTML;信息过滤器;可视化;网页;钇;DOM（文档对象模型）;数据路径;网站挖掘;Web数据提取;

相似文献

外文文献
中文文献
专利

1. Automatic Data Extraction from Websites for Generating Aquatic Product Market Information [J] . YUAN Hong-chun, CHEN Ying, SUN Yue-fu Journal of Dong Hua University . 2006,第6期

机译：从网站自动提取数据以生成水产品市场信息
2. Automatic extraction of dislocated horizons from 3D seismic data using nonlocal trace matching [J] . Bugge Aina Juell, Lie Jan Erik, Evensen Andreas Kjelsrud, Geophysics: Journal of the Society of Exploration Geophysicists . 2019,第6期

机译：使用非识别跟踪匹配自动提取3D地震数据的脱位视野
3. Automatic fracture–vug identification and extraction from electric imaging logging data based on path morphology [J] . Xi-Ning Li, Jin-Song Shen, Wu-Yang Yang, Petroleum science . 2019,第1期

机译：基于路径形态的电成像测井数据自动骨折识别和提取
4. Automatic data extraction of websites using data path matching and alignment [C] . Yu-Chun Chu, Chiun-Chieh Hsu, Chen-Jhe Lee, International Conference on Digital Information Processing and Communications . 2015

机译：使用数据路径匹配和对齐自动提取网站数据
5. Efficient automatic history matching by reducing the observed data. [D] . Liu, Bin. 2006

机译：通过减少观察到的数据实现高效的自动历史匹配。
6. Using automatic alignment to analyze endangered language data: Testing the viability of untrained alignment [O] . Christian DiCanio, Hosung Nam, Douglas H. Whalen, -1

机译：使用自动对齐方式分析濒危语言数据：测试未经训练的对齐方式的可行性
7. Automatic Data Path Extraction in Large-Scale Register-Transfer Level Designs [O] . Wei Song, Jim Garside, Doug Edwards 2015

机译：大规模寄存器 - 传输层设计中的自动数据路径提取
8. Investigation of Procedures for Automatic Resonance Extraction from Noisy Transient Electromagnetics Data. Volume I. Investigation of Resonance Extraction Procedures [R] . Auton, J. R., Van Blaricum, M. L. 1981

机译：噪声瞬态电磁数据自动共振提取程序研究。第一卷。共振提取程序的研究

Automatic data extraction of websites using data path matching and alignment

摘要

著录项

相似文献

相关主题

期刊订阅