Prevent XPath and CSS Based Scrapers by Using Markup Randomizer

Ahmed Diab; Tawfiq Barhoum

首页> 外文期刊>International Arab Journal of e-Technology >Prevent XPath and CSS Based Scrapers by Using Markup Randomizer

【24h】

Prevent XPath and CSS Based Scrapers by Using Markup Randomizer

机译：通过使用标记随机化器防止基于XPath和CSS的爬虫

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web Scraping may consider as data theft action, several researchers have introduced some approach es for addressing this issue. These solutions could solve the problem in partial ways and sometimes, solution cannot be applicable with modern web techniques. Consequently, in our work we have introduced a new approach for stopping web scraping in an efficient way and applicable with modern web techniques called Markup Randomizer, which changes the HTML and CSS in proper way randomly in timely manner. The best feature of our model is that each web page can use it without paying any efforts or restrictions in web site markup. Experiments done over collected dataset which consist of 30 websites divided into three categories: News, Currency Rates and Weather. The proposed model based on Markup Randomizer applied over this dataset. The aim of the experimental is to measure the Similarity, File Size and the time. During testing the proposed model, we get that a change on the markup done up to 50%, file size is changed and optimized after during the process. The required time to applying the model and generating the new markup is good and up to 2 minutes. Finally, we find that our proposed markup randomizer is accepted.

机译：Web Scraping可以将其视为数据盗窃行为，一些研究人员介绍了一些解决此问题的方法。这些解决方案可以部分解决问题，有时解决方案不适用于现代Web技术。因此，在我们的工作中，我们引入了一种新的方法来有效地停止爬网，并适用于称为标记随机生成器的现代网络技术，该技术可以及时适当地随机更改HTML和CSS。我们模型的最大特点是，每个网页都可以使用它而无需付出任何努力或网站标记的限制。对收集的数据集进行的实验包括30个网站，这些网站分为三类：新闻，货币汇率和天气。基于标记随机化器的建议模型应用于此数据集。实验的目的是测量相似度，文件大小和时间。在测试所提出的模型期间，我们发现对标记的更改最多完成了50％，并且在此过程之后更改并优化了文件大小。应用模型和生成新标记所需的时间很好，并且最多需要2分钟。最后，我们发现我们提出的标记随机化器被接受。

著录项

来源
《International Arab Journal of e-Technology》 |2018年第2期|共10页
作者
Ahmed Diab; Tawfiq Barhoum;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Anti-ScraperAnti-Data TheftWeb Scrapers;

机译：防刮板反数据盗窃Web刮板;

相似文献

外文文献
中文文献
专利

1. The PREVENT study to evaluate the effectiveness and acceptability of a community-based intervention to prevent childhood tuberculosis in Lesotho: study protocol for a cluster randomized controlled trial [J] . Yael Hirsch-Moverman, Andrea A. Howard, Koen Frederix, Trials . 2017,第1期

机译：PREVENT研究旨在评估基于社区的干预措施在莱索托预防儿童结核病的有效性和可接受性：一项集群随机对照试验的研究方案
2. A web- and mobile phone-based intervention to prevent obesity in 4-year-olds (MINISTOP): a population-based randomized controlled trial [J] . Christine Delisle, Sven Sandin, Elisabet Forsum, BMC Public Health . 2015,第1期

机译：基于网络和移动电话的干预，以防止4岁的肥胖（矿物质）：基于人口的随机对照试验
3. Guided, internet-based, rumination-focused cognitive behavioural therapy (i-RFCBT) versus a no-intervention control to prevent depression in high-ruminating young adults, along with an adjunct assessment of the feasibility of unguided i-RFCBT, in the REducing Stress and Preventing Depression trial (RESPOND): study protocol for a phase III randomised controlled trial [J] . Lorna Cook, Edward Watkins Trials . 2016,第1期

机译：指导性，基于互联网的，以反刍为重点的认知行为疗法（i-RFCBT）与无干预控制相结合，以防止高反刍的年轻人抑郁，并在评估非指导性i-RFCBT的可行性的辅助评估中压力和预防抑郁症试验（RESPOND）：III期随机对照试验的研究方案
4. Model-based system architecture for preventing XPath injection in database-centric web services environment [C] . Asmawi Aziah, Affendey Lilly Suriani, Udzir Nur Izura, 2012 7th International Conference on Computing and Convergence Technology . 2012

机译：基于模型的系统架构，用于在以数据库为中心的Web服务环境中防止XPath注入
5. A Randomized Controlled Trial to Evaluate the Effectiveness of an mHealth Physical Activity Intervention to Reduce or Prevent Lumbopelvic Pain in Hong Kong Chinese Pregnant Women [D] . Au-Yeung, Elce. 2019

机译：随机对照试验，以评估MHEHEATH身体活动干预的有效性，以减少或预防香港中国孕妇的腰瓣疼痛
6. The PREVENT study to evaluate the effectiveness and acceptability of a community-based intervention to prevent childhood tuberculosis in Lesotho: study protocol for a cluster randomized controlled trial [O] . Yael Hirsch-Moverman, Andrea A. Howard, Koen Frederix, 2017

机译：PREVENT研究旨在评估基于社区的干预措施在莱索托预防儿童结核病的有效性和可接受性：一项整群随机对照试验的研究方案
7. Evaluation of a Web-Based Randomized Controlled Trial Educational Intervention Based on Media Literacy on Preventing Substance Abuse Among College Students, Applying the Integrated Social Marketing Approach: A Study Protocol [O] . Hanieh Jormand, Saeed Bashirian, Majid Barati, 2021

机译：基于媒体素养的基于网络随机对照试验教育干预的评估，以防止大学生药物滥用，应用综合社会营销方法：研究议定书

Prevent XPath and CSS Based Scrapers by Using Markup Randomizer

摘要

著录项

相似文献

相关主题

期刊订阅