The investigation on the effect of feature vector dimension for spam email detection with a new framework

机译：利用新框架研究特征向量维对垃圾邮件检测的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, the effect of dimension for a feature vector on the classification of Turkish e-mails as spam or legitimate is investigated. Although hundreds of experimental studies are achieved especially for English, which is a non-agglutinative language, the number of efforts for Turkish, which is one of the most popular agglutinative languages in the world, is counted something on the fingers of one hand. Therefore, a solution is sought for Turkish spam e-mail problem taking the special characteristics of Turkish e-mails into consideration. The developed spam filtering framework has four components named as morphological decomposition, feature selection, training, and test phases. A fixed-prefix stemming approach is used to extract the features of an e-mail and then the Mutual Information (MI) method is carried out as the feature selection method. The Decision Tree (DT) and Artificial Neural Network (ANN) classifiers are employed and the recognition accuracies obtained from these methods are considerably satisfactory. The highest accuracy rates are 91.08% for ANN and 87.67% for DT methods when the dimensions of feature vectors are selected as 150×5) and (75×5), respectively.

机译：在这项研究中，研究了特征向量的维数对土耳其电子邮件分类为垃圾邮件或合法电子邮件的影响。尽管已经完成了数百项实验研究，尤其是针对英语（一种非凝集性语言）进行了研究，但是土耳其语（这是世界上最受欢迎的凝集性语言之一）的努力却是从一方面开始的。因此，寻求一种解决土耳其垃圾邮件问题的解决方案，其中要考虑到土耳其电子邮件的特殊特性。开发的垃圾邮件过滤框架具有四个组成部分，分别称为形态分解，特征选择，训练和测试阶段。使用固定前缀词干提取方法来提取电子邮件的特征，然后执行互信息（MI）方法作为特征选择方法。使用了决策树（DT）和人工神经网络（ANN）分类器，从这些方法获得的识别精度相当令人满意。当特征向量的尺寸分别选择为150×5）和（75×5）时，ANN的最高准确率是91.08％，DT方法的准确率是87.67％。

著录项

来源
《Iberian Conference on Information Systems and Technologies》|2014年|1-4|共4页
会议地点
作者
Ergin Semih; Isik Sahin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Accuracy; Artificial neural networks; Feature extraction; Support vector machine classification; Text categorization; Unsolicited electronic mail; Spam; artificial neural networks; decision tree; e-mail; legitimate; mutual information;

机译：准确性;人工神经网络;特征提取;支持向量机分类;文字分类;不请自来的电子邮件;垃圾邮件;人工神经网络;决策树;电子邮件;合法;共同信息;

相似文献

外文文献
中文文献
专利

1. Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection [J] . Seyyedi Seyyed Hossein, Minaei-Bidgoli Behrouz International journal of communication systems . 2018,第8期

机译：高维空间中特征子集选择的估计器学习自动机，案例研究：电子邮件垃圾邮件检测
2. A multi-agent system based for solving high-dimensional optimization problems: A case study on email spam detection [J] . Mohammadzadeh Hekmat, Gharehchopogh Farhad Soleimanian International journal of communication systems . 2021,第3期

机译：一种基于解决高维优化问题的多代理系统：以电子邮件垃圾邮件检测为例
3. Improved email spam detection model based on support vector machines [J] . Olatunji Sunday Olusanya Neural computing & applications . 2019,第3期

机译：改进了基于支持向量机的电子邮件垃圾邮件检测模型
4. The investigation on the effect of feature vector dimension for spam email detection with a new framework [C] . Ergin Semih, Isik Sahin Iberian Conference on Information Systems and Technologies . 2014

机译：新框架对垃圾邮件检测垃圾邮件检测效果的研究
5. Moran's I Spacial Auto-correlation and Anomaly Detection Utilizing PCA and High Dimensional Feature Vectors [D] . Wong, Roy Y. 2017

机译：莫兰的我是利用PCA和高维特征向量的空间自动相关和异常检测
6. Machine learning for email spam filtering: review approaches and open research problems [O] . Emmanuel Gbenga Dada, Joseph Stephen Bassi, Haruna Chiroma, 2019

机译：用于电子邮件垃圾邮件过滤的机器学习：评论方法和公开研究问题
7. A New Multi-Agent Approach for Solving Optimization Problems with High-Dimensional: Case Study in Email Spam Detection [O] . Hekmat Mohmmadzadeh, Farhad Soleimanian Gharehchopogh 2020

机译：一种新的多代理方法，用于解决高维的优化问题：案例研究在电子邮件垃圾邮件探测中

The investigation on the effect of feature vector dimension for spam email detection with a new framework

摘要

著录项

相似文献

相关主题

期刊订阅