首页> 外文期刊>Journal of Computing and Information Technology >A Phishing Webpage Detection Method Based on Stacked Autoencoder and Correlation Coefficients
【24h】

A Phishing Webpage Detection Method Based on Stacked Autoencoder and Correlation Coefficients

机译:基于堆叠式自动编码器和相关系数的钓鱼网页检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

Phishing is a kind of cyber-attack that targets naive online users by tricking them into revealing sensitive information. There are many anti-phishing solutions proposed to date, such as blacklist or whitelist, heuristic-based and machine learning-based methods. However, online users are still being trapped into revealing sensitive information in phishing websites. In this paper, we propose a novel phishing webpage detection model, based on features that are extracted from URL, source codes of HTML, and the third-party services to represent the basic characters of phishing webpages, which uses a deep learning method - Stacked Autoencoder (SAE) to detect phishing webpages. To make features in the same order of magnitude, three kinds of normalization methods are adopted. In particular, a method to calculate correlation coefficients between weight matrixes of SAE is proposed to determine optimal width of hidden layers, which shows high computational efficiency and feasibility. Based on the testing of a set of phishing and benign webpages, the model using SAE achieves the best performance when compared to other algorithms such as Naive Bayes (NB), Support Vector Machine (SVM), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN). It indicates that the proposed detection model is promising and can be applied effectively to phishing detection.
机译:网络钓鱼是一种网络攻击,旨在通过诱使幼稚的在线用户泄露敏感信息来对其进行攻击。迄今为止,提出了许多反网络钓鱼解决方案,例如黑名单或白名单,基于启发式和基于机器学习的方法。但是,在线用户仍被诱骗在网络钓鱼网站中泄露敏感信息。在本文中,我们基于从URL提取的功能,HTML的源代码以及代表网络钓鱼网页基本特征的第三方服务,提出了一种新颖的网络钓鱼网页检测模型,该模型使用了一种深度学习方法-Stacked自动编码器(SAE),用于检测网络钓鱼网页。为了使特征具有相同的数量级,采用了三种标准化方法。特别提出了一种计算SAE权重矩阵之间的相关系数的方法,以确定最优的隐藏层宽度,具有较高的计算效率和可行性。基于对一组网页仿冒和良性网页的测试,与其他算法(如朴素贝叶斯(NB),支持向量机(SVM),卷积神经网络(CNN)和递归算法)相比,使用SAE的模型具有最佳性能神经网络(RNN)。这表明所提出的检测模型是有前途的,可以有效地应用于网络钓鱼检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号