HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

机译：HTMLPhish：通过在HTML分析中应用深度学习技术来启用网络钓鱼网页检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the development and implementation of phishing attacks require little technical skills and costs. This uprising has led to an ever-growing number of phishing attacks on the World Wide Web. Consequently, proactive techniques to fight phishing attacks have become extremely necessary. In this paper, we propose HTMLPhish, a deep learning based data-driven end-to-end automatic phishing web page classification approach. Specifically, HTMLPhish receives the content of the HTML document of a web page and employs Convolutional Neural Networks (CNNs) to learn the semantic dependencies in the textual contents of the HTML. The CNNs learn appropriate feature representations from the HTML document embeddings without extensive manual feature engineering. Furthermore, our proposed approach of the concatenation of the word and character embeddings allows our model to manage new features and ensure easy extrapolation to test data. We conduct comprehensive experiments on a dataset of more than 50,000 HTML documents that provides a distribution of phishing to benign web pages obtainable in the real-world that yields over 93% Accuracy and True Positive Rate. Also, HTMLPhish is a completely language-independent and client-side strategy which can, therefore, conduct web page phishing detection regardless of the textual language.

机译：最近，网络钓鱼攻击的开发和实施只需要很少的技术技能和成本。这种起义导致万维网上的网络钓鱼攻击越来越多。因此，对抗网络钓鱼攻击的主动技术已变得极为必要。在本文中，我们提出HTMLPhish，这是一种基于深度学习的基于数据驱动的端到端自动网络钓鱼网页分类方法。具体来说，HTMLPhish接收网页HTML文档的内容，并使用卷积神经网络（CNN）来学习HTML文本内容中的语义依赖性。 CNN可从HTML文档嵌入中学习适当的特征表示，而无需进行大量的手动特征工程。此外，我们提出的单词和字符嵌入的串联方法使我们的模型能够管理新功能并确保轻松推断测试数据。我们对超过50,000个HTML文档的数据集进行了全面的实验，该文档提供了从网络钓鱼到良性网页的分布，这些网页在现实世界中可获取，其准确性和真阳性率超过93％。同样，HTMLPhish是一种完全独立于语言的客户端策略，因此，无论文本语言如何，都可以执行网页仿冒检测。

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-8|共8页
会议地点
作者
Chidimma Opara; Bo Wei; Yingke Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Phishing; Feature extraction; Web pages; Machine learning; Neural networks; Uniform resource locators; Data models;

机译：网络钓鱼;特征提取;网页;机器学习;神经网络;统一资源定位器;数据模型;

相似文献

外文文献
中文文献
专利

1. A Deep Learning Technique for Web Phishing Detection Combined URL Features and Visual Similarity [J] . Saad Al-Ahmadi, Yasser Alharbi International Journal of Computer Networks & Communications . 2020,第5期

机译：用于Web网络钓鱼检测组合URL特征和视觉相似性的深度学习技术
2. Web Phishing Detection Using a Deep Learning Framework [J] . Yi Ping, Guan Yuxiang, Zou Futai, Wireless communications & mobile computing . 2018,第1期

机译：使用深度学习框架进行网络钓鱼检测
3. Applying Data Mining Techniques in Intrusion Detection System on Web and Analysis of Web Usage [J] . Alaa H. Al-Hamami, Mohammad Ala`a Al-Hamami, Soukaena Hassan Hasheem Information Technology Journal . 2006,第1期

机译：数据挖掘技术在Web入侵检测系统中的应用及Web使用率分析
4. Machine LearningTechniquesfor Detection of Website Phishing: A Review for Promises and Challenges [C] . Ammar Odeh, Ismail Keshta, Eman Abdelfattah Annual Computing and Communication Workshop and Conference . 2021

机译：用于检测网站网络钓鱼的机器学习：对承诺和挑战的审查
5. Learning Early-Stage Web Development at Scale: Exploring Methods to Assess Learning Through Analysis of HTML and CSS [D] . Kim, Meen Chul. 2021

机译：在规模上学习早期的Web开发：通过分析HTML和CSS来探索评估学习的方法
6. A Deep-Learning-Driven Light-Weight Phishing Detection Sensor [O] . Bo Wei, Rebeen Ali Hamad, Longzhi Yang, 2019

机译：深度学习驱动的轻型网络钓鱼检测传感器
7. Phishing website detection using machine learning and deep learning techniques [O] . M Selvakumari, M Sowjanya, Sneha Das, 2021

机译：网络钓鱼网站检测使用机器学习和深度学习技术

HTMLPhish: Enabling Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

摘要

著录项

相似文献

相关主题

期刊订阅