Unbiased Phishing Detection Using Domain Name Based Features

机译：使用基于域名的功能进行公正的网络钓鱼检测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Internet users are coming under a barrage of phishing attacks of increasing frequency and sophistication. While these attacks have been remarkably resilient against the vast range of defenses proposed by academia, industry, and research organizations, machine learning approaches appear to be a promising one in distinguishing between phishing and legitimate websites. There are three main concerns with existing machine learning approaches for phishing detection. The first concern is there is neither a framework, preferably open-source, for extracting feature and keeping the dataset updated nor an updated dataset of phishing and legitimate website. The second concern is the large number of features used and the lack of validating arguments for the choice of the features selected to train the machine learning classifier. The last concern relates to the type of datasets used in the literature that seems to be inadvertently biased with respect to the features based on URL or content.;In this thesis, we describe the implementation of our open-source and extensible framework to extract features and create up-to-date phishing dataset. With having this framework, named Fresh-Phish, we implemented 29 different features that we used to detect whether a given website is legitimate or phishing. We used 26 features that were reported in related work and added 3 new features and created a dataset of 6,000 websites with these features of which 3,000 were malicious and 3,000 were genuine and tested our approach. Using 6 different classifiers we achieved the accuracy of 93% which is a reasonable high in this field.;To address the second and third concerns, we put forward the intuition that the domain name of phishing websites is the tell-tale sign of phishing and holds the key to successful phishing detection. We focus on this aspect of phishing websites and design features that explore the relationship of the domain name to the key elements of the website. Our work differs from existing state-of-the-art as our feature set ensures that there is minimal or no bias with respect to a dataset. Our learning model trains with only seven features and achieves a true positive rate of 98% and a classification accuracy of 97%, on sample dataset. Compared to the state-of-the-art work, our per data instance processing and classification is 4 times faster for legitimate websites and 10 times faster for phishing websites. Importantly, we demonstrate the shortcomings of using features based on URLs as they are likely to be biased towards dataset collection and usage. We show the robustness of our learning algorithm by testing our classifiers on unknown live phishing URLs and achieve a higher detection accuracy of 99.7% compared to the earlier known best result of 95% detection rate.

机译：互联网用户正遭受越来越多的频率和复杂性的网络钓鱼攻击。尽管这些攻击对学术界，行业和研究组织提出的广泛防御措施具有显着的抵御能力，但机器学习方法似乎是区分网络钓鱼和合法网站的一种有前途的方法。现有的网络钓鱼检测机器学习方法主要涉及三个方面。首先要考虑的是，既没有框架（最好是开源的）来提取特征并保持数据集更新，也没有网络钓鱼和合法网站的更新数据集。第二个关注点是所使用的功能众多，并且缺乏用于选择训练机器学习分类器的功能选择的验证论据。最后一个问题与文献中使用的数据集的类型有关，这些数据集似乎在基于URL或内容的特征上无意间产生了偏差。在本论文中，我们描述了我们的开源可扩展框架的实现，以提取特征并创建最新的网络钓鱼数据集。有了这个名为Fresh-Phish的框架，我们实现了29种不同的功能，这些功能用于检测给定网站是否合法或网络钓鱼。我们使用了相关工作中报告的26个功能，并添加了3个新功能，并创建了一个包含6,000个网站的数据集，其中3,000个是恶意的，而3,000个是真实的并测试了我们的方法。使用6个不同的分类器，我们达到了93％的准确率，在该领域中是相当高的。；针对第二和第三个问题，我们提出了钓鱼网站的域名是钓鱼和钓鱼的明显标志的直觉。掌握成功进行网络钓鱼检测的关键。我们专注于网络钓鱼网站的这一方面和设计功能，以探索域名与网站关键元素的关系。我们的工作与现有的最新技术有所不同，因为我们的功能集可确保对数据集的偏差最小或没有偏差。我们的学习模型仅对七个特征进行训练，并且在样本数据集上实现了98％的真实阳性率和97％的分类精度。与最新技术相比，对于合法网站，我们的每个数据实例处理和分类速度快4倍，对于仿冒网站速度则快10倍。重要的是，我们展示了使用基于URL的功能的缺点，因为它们可能会偏向数据集的收集和使用。通过在未知的实时网络钓鱼URL上测试分类器，我们展示了学习算法的鲁棒性，与之前已知的95％的最佳检测率结果相比，该算法的检测精度达到了99.7％。

著录项

作者
Shirazi, Hossein.;
展开▼
作者单位

Colorado State University.;

展开▼
授予单位 Colorado State University.;
学科 Computer science.
学位 M.S.
年度 2018
页码 72 p.
总页数 72
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A domain-feature enhanced classification model for the detection of Chinese phishing e-Business websites [J] . Dongsong Zhang, Zhijun Yan, Hansi Jiang, Information & Management . 2014,第7期

机译：用于检测网络钓鱼电子商务网站的域特征增强分类模型
2. Malicious Domain Detection Using Machine Learning On Domain Name Features, Host-Based Features and Web-Based Features [J] . Gopinath Palaniappan, Sangeetha S, Balaji RajendranSanjay, Procedia Computer Science . 2020,第5期

机译：使用机器学习在域名特征上的恶意域检测，基于主机的功能和基于Web的功能
3. A Study of Feature Selection and Dimensionality Reduction Methods for Classification-Based Phishing Detection System [J] . Singh Amit, Tiwari Abhishek International journal of information retrieval research . 2021,第1期

机译：基于分类的网络钓鱼检测系统特征选择和维数减少方法的研究
4. Using Domain Top-page Similarity Feature in Machine Learning-Based Web Phishing Detection [C] . Sanglerdsinlapachai N., Rungsawang A. Knowledge Discovery and Data Mining, 2010. WKDD '10 . 2010

机译：在基于机器学习的Web网络钓鱼检测中使用域首页相似性功能
5. Categorization of Phishing Detection Features and Using the Feature Vectors to Classify Phishing Websites [D] . Namasivayam, Bhuvana. 2017

机译：对网络钓鱼检测特征的分类，并使用特征向量对网络钓鱼网站进行分类
6. Phishing Email Detection Based on Binary Search Feature Selection [O] . Gunikhan Sonowal -1

机译：基于二元搜索特征选择的网络钓鱼电子邮件检测
7. Phishing website detection using intelligent data mining techniques. Design and development of an intelligent association classification mining fuzzy based scheme for phishing website detection with an emphasis on E-banking. [O] . Abur-rous Maher Ragheb Mohammed 2010

机译：使用智能数据挖掘技术的网络钓鱼网站检测。一种基于智能关联分类挖掘模糊的网络钓鱼网站检测方案的设计与开发，重点是电子银行。

Unbiased Phishing Detection Using Domain Name Based Features

摘要

著录项

相似文献

相关主题

期刊订阅