...
首页> 外文期刊>Journal of network and computer applications >Bot recognition in a Web store: An approach based on unsupervised learning
【24h】

Bot recognition in a Web store: An approach based on unsupervised learning

机译:在网上商店的机器人识别:一种基于无监督学习的方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning strategy that decouples modelling the data from labelling them. Its efficiency is evaluated through experiments on real e-commerce data, in realistic conditions, and compared to that of supervised learning classifiers (a multi-layer perceptron neural network and a support vector machine). Results demonstrate that the classification based on unsupervised learning is very efficient, achieving a similar performance level as the fully supervised classification. This is an experimental indication that the bot recognition problem can be successfully dealt with using methods that are less sensitive to mislabelled data or missing labels. A very small fraction of sessions remain misclassified in both cases, so an in-depth analysis of misclassified samples was also performed. This analysis exposed the superiority of the proposed approach which was able to correctly recognize more bots, in fact, and identified more camouflaged agents, that had been erroneously labelled as humans.
机译:电子商务网站上的Web流量越来越多地由人工代理(Web机器人)构成对网站安全,隐私和性能的威胁。为了开发有效的机器人检测方法并发现可靠的电子客户行为模式,需要合法用户和网站机器人产生的准确分离。本文提出了一种机器人和人类会话分类问题的机器学习解决方案,具有电子商务的特定应用。在这项工作中研究的方法探讨了无监督的学习(K-Means和Deped Possibilistic C-Meancy)的使用,然后监督群集标签,这是一种与模拟数据建模数据标记的生成学习策略。其效率是通过实际条件的实际电子商务数据的实验评估的,并与监督学习分类器(多层Perceptron神经网络和支持向量机)相比。结果表明,基于无监督学习的分类非常有效,实现了与完全监督分类的性能水平相似。这是一种实验指示,可以使用对误标记的数据或丢失标签不太敏感的方法成功处理机器人识别问题。两种病例中的一小部分仍然被错误分类,因此还进行了对错误分类样品的深入分析。这种分析暴露了能够正确认识更多机器人的所提出方法的优势,实际上并确定了更伪装的代理商,这已被错误地标记为人类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号