首页> 外文会议>Iberian Conference on Information Systems and Technologies >The investigation on the effect of feature vector dimension for spam email detection with a new framework
【24h】

The investigation on the effect of feature vector dimension for spam email detection with a new framework

机译:利用新框架研究特征向量维对垃圾邮件检测的影响

获取原文

摘要

In this study, the effect of dimension for a feature vector on the classification of Turkish e-mails as spam or legitimate is investigated. Although hundreds of experimental studies are achieved especially for English, which is a non-agglutinative language, the number of efforts for Turkish, which is one of the most popular agglutinative languages in the world, is counted something on the fingers of one hand. Therefore, a solution is sought for Turkish spam e-mail problem taking the special characteristics of Turkish e-mails into consideration. The developed spam filtering framework has four components named as morphological decomposition, feature selection, training, and test phases. A fixed-prefix stemming approach is used to extract the features of an e-mail and then the Mutual Information (MI) method is carried out as the feature selection method. The Decision Tree (DT) and Artificial Neural Network (ANN) classifiers are employed and the recognition accuracies obtained from these methods are considerably satisfactory. The highest accuracy rates are 91.08% for ANN and 87.67% for DT methods when the dimensions of feature vectors are selected as 150×5) and (75×5), respectively.
机译:在这项研究中,研究了特征向量的维数对土耳其电子邮件分类为垃圾邮件或合法电子邮件的影响。尽管已经完成了数百项实验研究,尤其是针对英语(一种非凝集性语言)进行了研究,但是土耳其语(这是世界上最受欢迎的凝集性语言之一)的努力却是从一方面开始的。因此,寻求一种解决土耳其垃圾邮件问题的解决方案,其中要考虑到土耳其电子邮件的特殊特性。开发的垃圾邮件过滤框架具有四个组成部分,分别称为形态分解,特征选择,训练和测试阶段。使用固定前缀词干提取方法来提取电子邮件的特征,然后执行互信息(MI)方法作为特征选择方法。使用了决策树(DT)和人工神经网络(ANN)分类器,从这些方法获得的识别精度相当令人满意。当特征向量的尺寸分别选择为150×5)和(75×5)时,ANN的最高准确率是91.08%,DT方法的准确率是87.67%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号