【24h】

An Overview of Web Robots Detection Techniques

机译:Web机器人检测技术概述

获取原文

摘要

Web robots or web crawlers have become the major source of web traffic. While some robots are well-behaving such as search engines, others can perform DDoS attacks, which put great threats on websites. Effectively detecting web robots will benefit not only for network traffic cleaning, but also for improving the cybersecurity of IoT enabled systems and services. To get the state of the arts in web robot detection, this paper reviews recent decade research on web robot or web robot/crawler detection techniques and compares their performances and identify the challenges of different techniques, thus providing researchers a reference for the development of web robots detection in real applications. To protect web content from malicious web robots, researchers have investigated various approaches, but they can be classified into three themes: offline web log analysis, honeypots and online robot detection. We conclude that off-line web log analysis methods have quite high accuracy, but they are time-consuming compared to online detection methods. Honeypots, as a computer security mechanism, can be used to engage and deceive hackers and identify malicious activities performed over the Internet, but they may block legitimate robots. The review shows that a hybrid method is better than an individual classifier, and the performance of online web robot detection needs to be improved. Also, different types of features could play different roles in different machine learning models. Therefore, feature selection is important for web robot/crawler detection.
机译:网络机器人或网络爬虫已成为网络流量的主要来源。尽管某些机器人(例如搜索引擎)的运行状况良好,但其他机器人却可以执行DDoS攻击,这对网站构成了巨大威胁。有效地检测Web机器人不仅将有益于网络流量清理,而且将有助于改善支持IoT的系统和服务的网络安全性。为了了解网络机器人检测的最新技术,本文回顾了近十年来对网络机器人或网络机器人/爬虫检测技术的研究,并比较了它们的性能并确定了不同技术的挑战,从而为研究人员提供了网络开发的参考。实际应用中的机器人检测。为了保护Web内容不受恶意Web机器人的攻击,研究人员研究了各种方法,但可以将它们分为三个主题:脱机Web日志分析,蜜罐和在线机器人检测。我们得出的结论是,离线Web日志分析方法具有很高的准确性,但是与在线检测方法相比,它们很耗时。蜜罐作为一种计算机安全机制,可以用来吸引和欺骗黑客并识别通过Internet执行的恶意活动,但它们可能会阻止合法的机器人。审查表明,混合方法比单个分类器更好,并且在线Web机器人检测的性能有待提高。同样,不同类型的功能可以在不同的机器学习模型中扮演不同的角色。因此,功能选择对于网络机器人/爬网程序检测很重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号