【24h】

URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection

机译:URL2Vec:具有字符嵌入的URL建模,可进行快速准确的网络钓鱼网站检测

获取原文
获取原文并翻译 | 示例

摘要

A deep learning-based approach to phishing detection is proposed. Specifically, websites' URLs and the characters in these URLs are mapped to documents and words, respectively, in the context of word2vec-based word embedding learning. Consequently, character embedding can be achieved from a corpus of URLs in an unsupervised manner. Furthermore, we combine character embedding with the structures of URLs to obtain the vector representations of the URLs. In particular, an URL is partitioned into the following five sections: URL protocol, sub-domain name, domain name, domain suffix, and URL path. To identify the phishing URLs, existing classification algorithms can be used smoothly on the vector representations of the URLs, avoiding laborious work on designing effective features manually and empirically. For evaluations, we collect a large-scale dataset, i.e., 1 Million Phishing Detection Dataset (1M-PD), which has been released for public use. Extensive experiments conducted on two real-world datasets show the effectiveness of the proposed approach, which achieves an accuracy of 99.69% with 0.40% false positive and 99.79% true positives on the 1M-PD dataset. In particular, the proposed approach detects each URL in 32ms on average merely on a personal computer, which is much faster than existing approaches and even can be considered real-time.
机译:提出了一种基于深度学习的网络钓鱼检测方法。具体而言,在基于word2vec的词嵌入学习的背景下,网站的URL和这些URL中的字符分别映射到文档和单词。因此,可以以无监督的方式从URL语料库实现字符嵌入。此外,我们将字符嵌入与URL的结构相结合以获得URL的矢量表示。特别是,URL分为以下五个部分:URL协议,子域名,域名,域后缀和URL路径。为了识别网络钓鱼URL,可以在URL的向量表示形式上平稳使用现有的分类算法,从而避免了人工和经验设计有效特征的繁琐工作。为了进行评估,我们收集了一个大规模数据集,即一百万个网络钓鱼检测数据集(1M-PD),已发布供公众使用。在两个真实世界的数据集上进行的大量实验证明了该方法的有效性,该方法在1M-PD数据集上具有0.40 \\%的假阳性和99.79 \\%的真实阳性,可达到99.69%的准确度。特别地,所提出的方法仅在个人计算机上平均在32ms内检测到每个URL,这比现有方法快得多,甚至可以被认为是实时的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号