首页> 外文会议>International Conference on Security and Cryptography >A Machine-learning based Unbiased Phishing Detection Approach
【24h】

A Machine-learning based Unbiased Phishing Detection Approach

机译:一种基于机器学习的无偏见的网络钓鱼检测方法

获取原文

摘要

Phishing websites mimic a legitimate website to capture sensitive information of users. Machine learning is often used to detect phishing websites. In current machine-learning based approaches, the phishing and the genuine sites are classified into two groups based on some features. We feel that this is an inadequate modeling of the problem as the characteristics of different phishing websites may vary widely. Moreover, the current approaches are biased towards groups of over-represented samples. Most importantly, as new features are exploited, the training set must be updated to detect new phishing sites. There is a time lag between the evolution of new phishing sites and retraining of the model, which can be exploited by attackers. We provide an alternative approach that aims to solve the above-mentioned problems. Instead of finding commonalities among non-related genuine websites, we find similarity of a suspicious website to a legitimate target and use machine learning to decide whether the suspicious site is impersonating the target. We define the fingerprint of a legitimate website by using visual and textual characteristics against which a sample is compared to ascertain whether it is fake. We implemented our approach on 14 legitimate websites and tested against 1446 unique samples. Our model reported an accuracy of at least 98% and it is not biased towards any website. This is in contrast to the current machine learning models that may be biased towards groups of over-represented samples and lead to more false-negative errors for less popular websites.
机译:网络钓鱼网站模仿合法的网站以捕获用户的敏感信息。机器学习通常用于检测网络钓鱼网站。在当前的基于机器学习的方法中,基于某些功能,网络钓鱼和真正的网站分为两组。我们认为,由于不同网络钓鱼网站的特征可能会随着广泛而变化,这是对问题的不足。此外,目前方法偏向于过度代表的样本组。最重要的是,随着新功能的利用,必须更新培训集以检测新的网络钓鱼站点。新网络钓鱼站点的演变与模型的再培训之间存在时间延续,这可以被攻击者利用。我们提供一种旨在解决上述问题的替代方法。而不是在非相关的真正网站中寻找共性,我们发现可疑网站的相似性并使用机器学习来决定可疑的网站是否冒充目标。我们通过使用视觉和文本特征来定义合法网站的指纹,以确定样本是否确定它是假的。我们在14个合法网站上实施了我们的方法,并针对1446个独特的样本测试。我们的型号报告了至少98%的准确性,它并不偏向任何网站。这与当前机器学习模型相反,可能偏向于多个超代表样本的组,并导致更不流行的网站的虚假负误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号