Lightweight URL-based phishing detection using natural language processing transformers for mobile devices

Katherine Haynes; Hossein Shirazi; Indrakshi Ray

首页> 外文期刊>Procedia Computer Science >Lightweight URL-based phishing detection using natural language processing transformers for mobile devices

【24h】

Lightweight URL-based phishing detection using natural language processing transformers for mobile devices

机译：基于轻量级URL的网络钓鱼检测使用自然语言处理变压器移动设备

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hackers are increasingly launching phishing attacks via SMS and social media. Games and dating apps introduce yet another attack vector. However, current deep learning-based phishing detection applications are not applicable to mobile devices due to the computational burden. We propose a lightweight phishing detection algorithm that distinguishes phishing from legitimate websites solely from URLs to be used in mobile devices. As a baseline performance, we apply Artificial Neural Networks (ANNs) to URL-based and HTML-based website features. A model search results in 15 ANN models with accuracies >96%, comparable to state-of-the-art approaches. Next, we test the performance of deep ANNs on URL-based features only; however, all models perform poorly with the highest accuracy of 86.2%, indicating that URL-based features alone are not adequate to detect phishing websites even with deep ANNs. Since language transformers learn to represent context-dependent text sequences, we hypothesize that they will be able to learn directly from the text in URLs to distinguish between legitimate and malicious websites. We apply two state-of-the-art deep transformers (BERT and ELECTRA) for phishing detection. Testing custom and standard vocabularies, we find that pre-trained transformers available for immediate use (with fine-tuning) outperform the model trained with the custom URL-based vocabulary. Using pre-trained transformers to predict phishing websites from only URLs has four advantages: 1) requires little training time (~8 minutes), 2) is more easily updatable than feature-based approaches because no pre-processing of URLs is required, 3) is safer to use because phishing websites can be predicted without physically visiting the malicious sites and 4) is easily deployable for real-time detection and is applicable to run on mobile devices.

机译：黑客越来越多地通过短信和社交媒体推出网络钓鱼攻击。游戏和约会应用程序介绍了另一个攻击矢量。然而，由于计算负担，目前基于深度学习的网络钓鱼检测应用不适用于移动设备。我们提出了一种轻量级网络钓鱼检测算法，其仅从合法网站区分网络钓鱼，仅从移动设备中使用的URL。作为基线性能，我们将人工神经网络（ANNS）应用于基于URL和基于HTML的网站特征。模型搜索结果在15个ANN型号中，精度> 96％，可与最先进的方法相媲美。接下来，我们仅测试基于URL的特征的深度的性能;然而，所有型号的最高精度均为86.2％，表明即使是深度的Anns，单独的URL的特征也不足以检测网络钓鱼网站。由于语言变形金刚学会代表相关的文本序列，我们假设他们能够直接从网址中直接学习，以区分合法和恶意的网站。我们应用两个最先进的深变压器（BERT和Electra）以进行网络钓鱼检测。测试定制和标准词汇表，我们发现可用于立即使用的预训练变压器（具有微调）优于与自定义URL的词汇表培训的模型。使用预先训练的变压器来预测只有URL的网络钓鱼网站有四个优点：1）需要几乎没有训练时间（〜8分钟），2）比基于特征的方法更容易更新，因为不需要预先处理URL，3 ）使用更安全，因为可以在没有物理访问恶意站点的情况下预测网络钓鱼网站，4）很容易部署用于实时检测，并且适用于在移动设备上运行。

著录项

来源
《Procedia Computer Science》 |2021年第a期|共8页
作者
Katherine Haynes; Hossein Shirazi; Indrakshi Ray;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Cybersecurity ? Social Engineering AttackPhishing DetectionDeep learningNeural NetworksNatural Language Processing (NLP)BERTELECTRAMobile Application;

机译：网络安全？社会工程灭亡检测Deview学习网络自然语言处理（NLP）Bert Electramile应用程序;

相似文献

外文文献
中文文献
专利

1. Phishing Email Detection Using Natural Language Processing Techniques: A Literature Survey [J] . Said Salloum, Tarek Gaber, Sunil Vadera, Procedia Computer Science . 2021,第a期

机译：使用自然语言处理技术的网络钓鱼电子邮件检测：文献调查
2. Optimization of URL-Based Phishing Websites Detection through Genetic Algorithms [J] . Muhammad Taseer Suleman, Shahid Mahmood Awan Automatic Control and Computer Sciences . 2019,第4期

机译：基于URL的网络钓鱼网站通过遗传算法进行了优化
3. PhishDump: A multi-model ensemble based technique for the detection of phishing sites in mobile devices [J] . Rao Routhu Srinivasa, Vaishnavi Tatti, Pais Alwyn Roshan Pervasive and Mobile Computing . 2019,第期

机译：PhishDump：基于多模型合奏的技术，用于检测移动设备中的网络钓鱼站点
4. Text message corpus: Applying natural language processing to mobile device forensics [C] . ODay Daniel R., Calix Ricardo A. IEEE International Conference on Multimedia and Expo Workshops . 2013

机译：文本语料库：将自然语言处理应用于移动设备取证
5. URL-based Phishing Detection using Entropy of Non-Alphanumeric Characters [D] . Eint Sandi Aung 2019

机译：使用非字母数字字符熵的基于URL的网络钓鱼检测
6. Natural Language Processing for Lines and Devices in Portable Chest X-Rays [O] . Daniel Rubin, Dan Wang, Dallas A. Chambers, 2010

机译：便携式胸部X射线中的线路和设备的自然语言处理
7. Bypassing Detection of URL-based Phishing Attacks Using Generative Adversarial Deep Neural Networks [O] . Ahmed AlEroud, George Karabatis 2020

机译：使用生成的对抗性深度神经网络绕过基于URL的网络钓鱼攻击的检测

Lightweight URL-based phishing detection using natural language processing transformers for mobile devices

摘要

著录项

相似文献

相关主题

期刊订阅