首页> 外文学位 >Unbiased Phishing Detection Using Domain Name Based Features
【24h】

Unbiased Phishing Detection Using Domain Name Based Features

机译:使用基于域名的功能进行公正的网络钓鱼检测

获取原文
获取原文并翻译 | 示例

摘要

Internet users are coming under a barrage of phishing attacks of increasing frequency and sophistication. While these attacks have been remarkably resilient against the vast range of defenses proposed by academia, industry, and research organizations, machine learning approaches appear to be a promising one in distinguishing between phishing and legitimate websites. There are three main concerns with existing machine learning approaches for phishing detection. The first concern is there is neither a framework, preferably open-source, for extracting feature and keeping the dataset updated nor an updated dataset of phishing and legitimate website. The second concern is the large number of features used and the lack of validating arguments for the choice of the features selected to train the machine learning classifier. The last concern relates to the type of datasets used in the literature that seems to be inadvertently biased with respect to the features based on URL or content.;In this thesis, we describe the implementation of our open-source and extensible framework to extract features and create up-to-date phishing dataset. With having this framework, named Fresh-Phish, we implemented 29 different features that we used to detect whether a given website is legitimate or phishing. We used 26 features that were reported in related work and added 3 new features and created a dataset of 6,000 websites with these features of which 3,000 were malicious and 3,000 were genuine and tested our approach. Using 6 different classifiers we achieved the accuracy of 93% which is a reasonable high in this field.;To address the second and third concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. We focus on this aspect of phishing websites and design features that explore the relationship of the domain name to the key elements of the website. Our work differs from existing state-of-the-art as our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance processing and classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards dataset collection and usage. We show the robustness of our learning algorithm by testing our classifiers on unknown live phishing URLs and achieve a higher detection accuracy of 99.7% compared to the earlier known best result of 95% detection rate.
机译:互联网用户正遭受越来越多的频率和复杂性的网络钓鱼攻击。尽管这些攻击对学术界,行业和研究组织提出的广泛防御措施具有显着的抵御能力,但机器学习方法似乎是区分网络钓鱼和合法网站的一种有前途的方法。现有的网络钓鱼检测机器学习方法主要涉及三个方面。首先要考虑的是,既没有框架(最好是开源的)来提取特征并保持数据集更新,也没有网络钓鱼和合法网站的更新数据集。第二个关注点是所使用的功能众多,并且缺乏用于选择训练机器学习分类器的功能选择的验证论据。最后一个问题与文献中使用的数据集的类型有关,这些数据集似乎在基于URL或内容的特征上无意间产生了偏差。在本论文中,我们描述了我们的开源可扩展框架的实现,以提取特征并创建最新的网络钓鱼数据集。有了这个名为Fresh-Phish的框架,我们实现了29种不同的功能,这些功能用于检测给定网站是否合法或网络钓鱼。我们使用了相关工作中报告的26个功能,并添加了3个新功能,并创建了一个包含6,000个网站的数据集,其中3,000个是恶意的,而3,000个是真实的并测试了我们的方法。使用6个不同的分类器,我们达到了93%的准确率,在该领域中是相当高的。;针对第二和第三个问题,我们提出了钓鱼网站的域名是钓鱼和钓鱼的明显标志的直觉。掌握成功进行网络钓鱼检测的关键。我们专注于网络钓鱼网站的这一方面和设计功能,以探索域名与网站关键元素的关系。我们的工作与现有的最新技术有所不同,因为我们的功能集可确保对数据集的偏差最小或没有偏差。我们的学习模型仅对七个特征进行训练,并且在样本数据集上实现了98%的真实阳性率和97%的分类精度。与最新技术相比,对于合法网站,我们的每个数据实例处理和分类速度快4倍,对于仿冒网站速度则快10倍。重要的是,我们展示了使用基于URL的功能的缺点,因为它们可能会偏向数据集的收集和使用。通过在未知的实时网络钓鱼URL上测试分类器,我们展示了学习算法的鲁棒性,与之前已知的95%的最佳检测率结果相比,该算法的检测精度达到了99.7%。

著录项

  • 作者

    Shirazi, Hossein.;

  • 作者单位

    Colorado State University.;

  • 授予单位 Colorado State University.;
  • 学科 Computer science.
  • 学位 M.S.
  • 年度 2018
  • 页码 72 p.
  • 总页数 72
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号