【24h】

Detecting phishing e-mails using text and data mining

机译:使用文本和数据挖掘来检测网络钓鱼电子邮件

获取原文
获取原文并翻译 | 示例

摘要

This paper presents text and data mining in tandem to detect the phishing email. The study employs Multilayer Perceptron (MLP), Decision Trees (DT), Support Vector Machine (SVM), Group Method of Data Handling (GMDH), Probabilistic Neural Net (PNN), Genetic Programming (GP) and Logistic Regression (LR) for classification. A dataset of 2500 phishing and non phishing emails is analyzed after extracting 23 keywords from the email bodies using text mining from the original dataset. Further, we selected 12 most important features using t-statistic based feature selection. Here, we did not find statistically significant difference in sensitivity as indicated by t-test at 1% level of significance, both with and without feature selection across all techniques except PNN. Since, the GP and DT are not statistically significantly different either with or without feature selection at 1% level of significance, DT should be preferred because it yields ‘if-then’ rules, thereby increasing the comprehensibility of the system.
机译:本文提出了串联文本和数据挖掘以检测网络钓鱼电子邮件的方法。该研究采用了多层感知器(MLP),决策树(DT),支持向量机(SVM),数据处理组方法(GMDH),概率神经网络(PNN),遗传编程(GP)和逻辑回归(LR)分类。在使用原始数据集中的文本挖掘从电子邮件正文中提取23个关键字之后,分析了2500个网络钓鱼和非网络钓鱼电子邮件的数据集。此外,我们使用基于t统计的特征选择选择了12个最重要的特征。在这里,我们没有发现在1%显着性水平上的t检验表明,无论有无特征选择,除PNN之外,所有技术都没有灵敏度的统计学差异。由于无论是否选择特征显着性水平,GP和DT在统计上均无显着差异,因此应首选DT,因为DT会产生“ if-then”规则,从而提高了系统的可理解性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号