首页> 外文会议>International Joint Conference on Neural Networks >HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis
【24h】

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

机译:HTMLPhish:通过在HTML分析中应用深度学习技术来启用网络钓鱼网页检测

获取原文

摘要

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based data-driven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language.
机译:最近,网络钓鱼攻击的开发和实施只需要很少的技术技能和成本。这种起义导致万维网上的网络钓鱼攻击越来越多。因此,对抗网络钓鱼攻击的主动技术已变得极为必要。在本文中,我们提出HTMLPhish,这是一种基于深度学习的基于数据驱动的端到端自动网络钓鱼网页分类方法。具体来说,HTMLPhish接收网页HTML文档的内容,并使用卷积神经网络(CNN)来学习HTML文本内容中的语义依赖性。 CNN可从HTML文档嵌入中学习适当的特征表示,而无需进行大量的手动特征工程。此外,我们提出的单词和字符嵌入的串联方法使我们的模型能够管理新功能并确保轻松推断测试数据。我们对超过50,000个HTML文档的数据集进行了全面的实验,该文档提供了从网络钓鱼到良性网页的分布,这些网页在现实世界中可获取,其准确性和真阳性率超过93%。同样,HTMLPhish是一种完全独立于语言的客户端策略,因此,无论文本语言如何,都可以执行网页仿冒检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号