WebSelF: A Web Scraping Framework

机译：WebSelF：Web爬网框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present WebSelF, a framework for web scraping which models the process of web scraping and decomposes it into four conceptually independent, reusable, and composable constituents. We have validated our framework through a full parameterized implementation that is flexible enough to capture previous work on web scraping. We conducted an experiment that evaluated several qualitatively different web scraping constituents (including previous work and combinations hereof) on about 11,000 HTML pages on daily versions of 17 web sites over a period of more than one year. Our framework solves three concrete problems with current web scraping and our experimental results, indicate that composition of previous and our new techniques achieve a higher degree of accuracy, precision and specificity than existing techniques alone.

机译：我们介绍WebSelF，这是一个用于Web抓取的框架，该框架可对Web抓取的过程进行建模，并将其分解为四个概念上独立，可重用和可组合的组成部分。我们已经通过完全参数化的实现对我们的框架进行了验证，该实现足够灵活以捕获以前有关Web抓取的工作。我们进行了一项实验，在超过一年的时间内，在17个网站的每日版本上的大约11,000个HTML页面上评估了几种质量上不同的网络抓取组件（包括以前的工作及其组合）。我们的框架解决了当前网页抓取的三个具体问题以及我们的实验结果，表明与单独使用现有技术相比，现有技术和新技术的组合可实现更高的准确性，准确性和特异性。

著录项

来源
《International conference on web engineering》|2012年|347-361|共15页
会议地点
作者
Jakob G. Thomsen; Erik Ernst; Claus Brabrand; Michael Schwartzbach;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Medical informatics labor market analysis using web crawling, web scraping, and text mining [J] . Schedlbauer Jurgen, Raptis Georgios, Ludwig Bernd International journal of medical informatics . 2021,第Juna期

机译：医疗信息学劳动力市场分析使用Web爬行，网页刮擦和文本挖掘
2. Use of Artificial Intelligence And Web Scraping Methods To Retrieve Information From The World Wide Web [J] . Marco Scarnò International Journal of Engineering Research and Applications . 2018,第1期

机译：使用人工智能和网页搜集方法从万维网检索信息
3. Optimized Template Detection and Extraction Algorithm for Web Scraping of Dynamic Web Pages [J] . Xin Luo Journal of wavelet theory and applications . 2017,第2期

机译：动态网页网页抓取的优化模板检测与提取算法
4. WebSelF: A Web Scraping Framework [C] . Jakob G. Thomsen, Erik Ernst, Claus Brabrand, Internationla Conference on Web Engineering . 2012

机译：WebSelf：Web刮框架
5. Brand positioning map and analysis using web scraping and advertisement analysis [D] . Bhatt, Surya 2015

机译：品牌定位图以及使用网页抓取和广告分析的分析
6. webTDat: A Web-Based Real-Time 3D Visualization Framework for Mesoscopic Whole-Brain Images [O] . Yuxin Li, Anan Li, Junhuai Li, 2020

机译：WebTDAT：基于Web的实时3D可视化框架用于介于介绍全脑图像
7. WebSelF: A Web Scraping Framework [O] . Jakob Thomsen, Erik Ernst, Claus Brabr, 2015

机译：WebselF：Web scraping Framework

WebSelF: A Web Scraping Framework

摘要

著录项

相似文献

相关主题

期刊订阅