Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

Seyyedi Seyyed Hossein; Minaei-Bidgoli Behrouz

首页> 外文期刊>International journal of communication systems >Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

【24h】

Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

机译：高维空间中特征子集选择的估计器学习自动机，案例研究：电子邮件垃圾邮件检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the difficult challenges facing data miners is that algorithm performance degrades if the feature space contains redundant or irrelevant features. Therefore, as a critical preprocess task, dimension reduction is used to build a smaller space containing valuable features. There are 2 different approaches for dimension reduction: feature extraction and feature selection, which itself is divided into wrapper and filter approaches. In high-dimensional spaces, feature extraction and wrapper approaches are not applicable due to the time complexity. On the other hand, the filter approach suffers from inaccuracy. One main reason for this inaccuracy is that the subset's size is not determined considering specifications of the problem. In this paper, we propose ESS (estimator learning automaton-based subset selection) as a new method for feature selection in high-dimensional spaces. The innovation of ESS is that it combines wrapper and filter ideas and uses estimator learning automata to efficiently determine a feature subset that leads to a desirable tradeoff between the accuracy and efficiency of the learning algorithm. To find a qualified subset for a special processing algorithm that functions on an arbitrary dataset, ESS uses an automaton to score each candidate subset upon the scale of the subset and accuracy of the learning algorithm using it. In the end, the subset with the highest score is returned. We have used ESS for feature selection in the framework of spam detection, a text classification task for email as a pervasive communication medium. The results show achievement in reaching the goal stated above.

机译：数据挖掘者面临的难题之一是，如果特征空间包含冗余或不相关的特征，则算法性能会下降。因此，作为关键的预处理任务，减小尺寸用于构建包含有价值特征的较小空间。有两种不同的降维方法：特征提取和特征选择，其本身又分为包装方法和过滤方法。在高维空间中，由于时间复杂性，特征提取和包装方法不适用。另一方面，过滤器方法存在误差。这种不准确性的一个主要原因是，未根据问题的具体情况确定子集的大小。在本文中，我们提出了ESS（基于估计器学习自动机的子集选择）作为高维空间特征选择的新方法。 ESS的创新之处在于，它结合了包装器和过滤器的思想，并使用估计器学习自动机来有效地确定特征子集，从而在学习算法的准确性和效率之间取得理想的平衡。为了找到可在任意数据集上运行的特殊处理算法的合格子集，ESS使用自动机根据子集的规模和使用该子集的学习算法的准确性对每个候选子集进行评分。最后，将返回得分最高的子集。我们已经在垃圾邮件检测的框架中使用ESS进行功能选择，垃圾邮件检测是一种文本分类任务，用于将电子邮件作为一种普遍的通信介质。结果表明在实现上述目标方面取得了成就。

著录项

来源
《International journal of communication systems》 |2018年第8期|e3541.1-e3541.17|共17页
作者
Seyyedi Seyyed Hossein; Minaei-Bidgoli Behrouz;
展开▼
作者单位

Islamic Azad Univ, Kashan Branch, Dept Comp Engn, Kashan, Iran;

Iran Univ Sci & Technol, Sch Comp Engn, Tehran, Iran;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
data mining; dimension reduction; estimator learning automata; high-dimensional space; spam detection; text classification;

机译：数据挖掘;降维;估计器学习自动机;高维空间;垃圾邮件检测;文本分类;

相似文献

外文文献
中文文献
专利

1. A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: Case study Email spam detection [J] . Mohammadzadeh Hekmat, Gharehchopogh Farhad Soleimanian Computational Intelligence . 2021,第1期

机译：具有特征选择的新型混合鲸优化算法：案例研究电子邮件垃圾邮件检测
2. Effect of feature selection methods on machine learning classifiers for detecting email spams [J] . Terry Riopka Computing reviews . 2014,第5期

机译：特征选择方法对机器学习分类器检测电子邮件垃圾邮件的影响
3. EMAIL SPAM DETECTION: A SYMBIOTIC FEATURE SELECTION APPROACH FOSTERED BY EVOLUTIONARY COMPUTATION [J] . PEDRO SOUSA, PAULO CORTEZ, RUI VAZ, International Journal of Information Technology & Decision Making . 2013,第4期

机译：电子邮件垃圾邮件检测：通过进化计算建立的符号特征选择方法
4. GA-based feature subset selection in a spam/non-spam detection system [C] . Behjat Amir Rajabi, Mustapha Aida, Nezamabadi-pour Hossein, International Conference on Computer and Communication Engineering . 2012

机译：垃圾邮件/非垃圾邮件检测系统的基于GA的特征子集选择
5. Robust Significant Feature Detection by Learning Discriminant Boundary in Multi-dimensional Space of Statistical Attributes. [D] . Bei, Yuanzhe. 2016

机译：通过学习统计属性多维空间中的判别边界，进行鲁棒的重要特征检测。
6. Visualizing histopathologic deep learning classification and anomaly detection using nonlinear feature space dimensionality reduction [O] . Kevin Faust, Quin Xie, Dominick Han, 2018

机译：使用非线性特征空间降维可视化组织病理学深度学习分类和异常检测
7. A Novel Hybrid Whale Optimization Algorithm with Flower Pollination Algorithm for Feature Selection: Case Study Email Spam Detection [O] . Hekmat Mohmmadzadeh, Farhad Soleimanian Gharehchopogh 2020

机译：具有特征选择的新型混合鲸优化算法：案例研究电子邮件垃圾邮件检测

Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

摘要

著录项

相似文献

相关主题

期刊订阅