【24h】

Phishing website detection using Latent Dirichlet Allocation and AdaBoost

机译:使用潜在Dirichlet分配和AdaBoost进行网络钓鱼网站检测

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

One of the ways criminals steal identity in the cyberspace is using phishing. Attackers host phishing websites that resemble a legitimate website and entice users to click on hyperlinks which directs them to these fake websites. Attackers use these fake sites to capture personal information such as login, passwords and social security numbers from innocent victims, which they later use to commit crimes. We propose here a robust methodology to detect phishing websites that employs for semantic analysis a topic modeling technique, Latent Dirichlet Allocation, and for classification, AdaBoost. The methodology developed is a content driven approach that is device independent and language neutral. The website content of mobile and desktop clients are collected by employing an intelligent web crawler. The website contents that are not in English are translated to English using Google's language translator. Topic model is built using the translated contents of desktop and mobile clients. The phishing website classifier is built using (i) distribution probabilities for the topics found as features using Latent Dirichlet Allocation and (ii) AdaBoost voting technique. Experiments were conducted using one of the large public corpus of website data containing 47500 phishing websites and 52500 good websites. Results show that our method achieves a F-measure of 99%.
机译:犯罪分子在网络空间中窃取身份的一种方式是使用网络钓鱼。攻击者拥有类似于合法网站的网上诱骗网站,并诱使用户单击将其定向到这些假网站的超链接。攻击者使用这些虚假网站从无辜受害者那里获取个人信息,例如登录名,密码和社会安全号码,然后将其用于犯罪。我们在这里提出一种检测网络钓鱼网站的可靠方法,该方法使用语义建模,主题建模技术Latent Dirichlet分配以及分类AdaBoost进行语义分析。所开发的方法是一种内容驱动的方法,该方法独立于设备且与语言无关。移动和桌面客户端的网站内容是通过使用智能Web搜寻器来收集的。使用Google语言翻译器将非英语的网站内容翻译成英语。主题模型是使用桌面和移动客户端的翻译内容构建的。网络钓鱼网站分类器是使用(i)使用潜在Dirichlet分配和(ii)AdaBoost投票技术发现的作为主题的主题的分布概率来构建的。使用包含47500个网络钓鱼网站和52500个优质网站的大型公共网站数据集之一进行了实验。结果表明,我们的方法达到了99%的F值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号