...
首页> 外文期刊>Procedia Computer Science >Lightweight URL-based phishing detection using natural language processing transformers for mobile devices
【24h】

Lightweight URL-based phishing detection using natural language processing transformers for mobile devices

机译:基于轻量级URL的网络钓鱼检测使用自然语言处理变压器移动设备

获取原文
           

摘要

Hackers are increasingly launching phishing attacks via SMS and social media. Games and dating apps introduce yet another attack vector. However, current deep learning-based phishing detection applications are not applicable to mobile devices due to the computational burden. We propose a lightweight phishing detection algorithm that distinguishes phishing from legitimate websites solely from URLs to be used in mobile devices. As a baseline performance, we apply Artificial Neural Networks (ANNs) to URL-based and HTML-based website features. A model search results in 15 ANN models with accuracies >96%, comparable to state-of-the-art approaches. Next, we test the performance of deep ANNs on URL-based features only; however, all models perform poorly with the highest accuracy of 86.2%, indicating that URL-based features alone are not adequate to detect phishing websites even with deep ANNs. Since language transformers learn to represent context-dependent text sequences, we hypothesize that they will be able to learn directly from the text in URLs to distinguish between legitimate and malicious websites. We apply two state-of-the-art deep transformers (BERT and ELECTRA) for phishing detection. Testing custom and standard vocabularies, we find that pre-trained transformers available for immediate use (with fine-tuning) outperform the model trained with the custom URL-based vocabulary. Using pre-trained transformers to predict phishing websites from only URLs has four advantages: 1) requires little training time (~8 minutes), 2) is more easily updatable than feature-based approaches because no pre-processing of URLs is required, 3) is safer to use because phishing websites can be predicted without physically visiting the malicious sites and 4) is easily deployable for real-time detection and is applicable to run on mobile devices.
机译:黑客越来越多地通过短信和社交媒体推出网络钓鱼攻击。游戏和约会应用程序介绍了另一个攻击矢量。然而,由于计算负担,目前基于深度学习的网络钓鱼检测应用不适用于移动设备。我们提出了一种轻量级网络钓鱼检测算法,其仅从合法网站区分网络钓鱼,仅从移动设备中使用的URL。作为基线性能,我们将人工神经网络(ANNS)应用于基于URL和基于HTML的网站特征。模型搜索结果在15个ANN型号中,精度> 96%,可与最先进的方法相媲美。接下来,我们仅测试基于URL的特征的深度的性能;然而,所有型号的最高精度均为86.2%,表明即使是深度的Anns,单独的URL的特征也不足以检测网络钓鱼网站。由于语言变形金刚学会代表相关的文本序列,我们假设他们能够直接从网址中直接学习,以区分合法和恶意的网站。我们应用两个最先进的深变压器(BERT和Electra)以进行网络钓鱼检测。测试定制和标准词汇表,我们发现可用于立即使用的预训练变压器(具有微调)优于与自定义URL的词汇表培训的模型。使用预先训练的变压器来预测只有URL的网络钓鱼网站有四个优点:1)需要几乎没有训练时间(〜8分钟),2)比基于特征的方法更容易更新,因为不需要预先处理URL,3 )使用更安全,因为可以在没有物理访问恶意站点的情况下预测网络钓鱼网站,4)很容易部署用于实时检测,并且适用于在移动设备上运行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号