Comparison of Web Scraping Techniques: Regular Expression, HTML DOM and Xpath

机译：Web刮擦技术的比较：正则表达式，HTML DOM和XPath

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data collection is the initial stage of research. There are various data sources on the internet that can be used in the research process. The process of taking data or information from sites on the internet is called web scraping. Some methods of web scraping include Regular Expression (Regex), HTML DOM and XPath. This study aims to determine the performance of the three methods of web scraping. The Comparison is done by testing each method when retrieving data from the target website, then measuring the performance of the process and comparing it. Process time, memory usage, and data consumption are used as measurement parameters in the experiment. The results of the experiment show that web scraping with the regex method is the smallest in memory usage compared to the HTML DOM method, and Xpath. While HTML DOM requires the least amount of time and the smallest data consumption compared to Regular Expression and XPath methods.

机译：数据收集是研究的初始阶段。 Internet上有各种数据源可用于研究过程。从互联网上的站点获取数据或信息的过程称为Web刮擦。一些Web擦伤方法包括正则表达式（正则表达式），HTML DOM和XPath。本研究旨在确定三种Web刮擦方法的性能。通过在从目标网站检索数据时测试每个方法进行比较，然后测量过程的性能并进行比较。处理时间，内存使用和数据消耗用作实验中的测量参数。实验结果表明，与HTML DOM方法和XPath相比，使用Regex方法的Web擦除是内存使用中最小的。与正则表达式和XPath方法相比，HTML DOM需要最少的时间和最小的数据消耗。

著录项

来源
《International Conference on Industrial Enterprise and System Engineering》|2019年|386p|共5页
会议地点
作者
Rohmat Gunawan; Alam Rahmatulloh; Irfan Darmawan; Firman Firdaus;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
DOM; Regex; Web scraping; Xpath;

机译：DOM;正则表达式;网页刮;XPath.;

相似文献

外文文献
中文文献
专利

1. Perbandingan Metode Web Scraping Menggunakan CSS Selector dan Xpath Selector [J] . Taufiq Rizaldi, Hermawan Arief Putranto Teknika . 2017,第1期

机译：CSS选择器和Xpath选择器进行Web爬网方法的比较
2. Web Page Analysis Based on HTML DOM and Its Usage for Forum Statistics, Alerts and Geo Targeted Data Retrieval [J] . ROBERT GYORODI, CORNELIA GYORODI, GEORGE PECHERLE, WSEAS Transactions on Computers . 2010,第7a9期

机译：基于HTML DOM的网页分析及其在论坛统计，警报和按地理位置定位的数据检索中的用途
3. Personalized Content Extraction and Text Classification Using Effective Web Scraping Techniques [J] . Karthikeyan T., Karthik Sekaran, Ranjith D., International journal of web portals . 2019,第2期

机译：使用有效的Web搜寻技术进行个性化内容提取和文本分类
4. Comparison of Web Scraping Techniques: Regular Expression, HTML DOM and Xpath [C] . Rohmat Gunawan, Alam Rahmatulloh, Irfan Darmawan, International Conference on Industrial Enterprise and System Engineering . 2019

机译：Web刮擦技术的比较：正则表达式，HTML DOM和XPath
5. Block-scoped Access Restriction Technique for HTML Content in Web Browsers [D] . Watt, Timothy 2012

机译：Web浏览器中HTML内容的块范围访问限制技术
6. An Evaluation of HTML5 and WebGL for Medical Imaging Applications [O] . Qiusha Min, Zhifeng Wang, Neng Liu 2018

机译：HTML5和WebGL在医学影像应用中的评估
7. Perbandingan Metode Web Scraping Menggunakan CSS Selector dan Xpath Selector [O] . Taufiq Rizaldi, Hermawan Arief Putranto 2017

机译：使用CSS选择器和XPath选择器的Web擦除方法的比较

Comparison of Web Scraping Techniques: Regular Expression, HTML DOM and Xpath

摘要

著录项

相似文献

相关主题

期刊订阅