...
首页> 外文期刊>International journal of communication systems >Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection
【24h】

Estimator learning automata for feature subset selection in high‐dimensional spaces, case study: Email spam detection

机译:高维空间中特征子集选择的估计器学习自动机,案例研究:电子邮件垃圾邮件检测

获取原文
获取原文并翻译 | 示例
           

摘要

One of the difficult challenges facing data miners is that algorithm performance degrades if the feature space contains redundant or irrelevant features. Therefore, as a critical preprocess task, dimension reduction is used to build a smaller space containing valuable features. There are 2 different approaches for dimension reduction: feature extraction and feature selection, which itself is divided into wrapper and filter approaches. In high-dimensional spaces, feature extraction and wrapper approaches are not applicable due to the time complexity. On the other hand, the filter approach suffers from inaccuracy. One main reason for this inaccuracy is that the subset's size is not determined considering specifications of the problem. In this paper, we propose ESS (estimator learning automaton-based subset selection) as a new method for feature selection in high-dimensional spaces. The innovation of ESS is that it combines wrapper and filter ideas and uses estimator learning automata to efficiently determine a feature subset that leads to a desirable tradeoff between the accuracy and efficiency of the learning algorithm. To find a qualified subset for a special processing algorithm that functions on an arbitrary dataset, ESS uses an automaton to score each candidate subset upon the scale of the subset and accuracy of the learning algorithm using it. In the end, the subset with the highest score is returned. We have used ESS for feature selection in the framework of spam detection, a text classification task for email as a pervasive communication medium. The results show achievement in reaching the goal stated above.
机译:数据挖掘者面临的难题之一是,如果特征空间包含冗余或不相关的特征,则算法性能会下降。因此,作为关键的预处理任务,减小尺寸用于构建包含有价值特征的较小空间。有两种不同的降维方法:特征提取和特征选择,其本身又分为包装方法和过滤方法。在高维空间中,由于时间复杂性,特征提取和包装方法不适用。另一方面,过滤器方法存在误差。这种不准确性的一个主要原因是,未根据问题的具体情况确定子集的大小。在本文中,我们提出了ESS(基于估计器学习自动机的子集选择)作为高维空间特征选择的新方法。 ESS的创新之处在于,它结合了包装器和过滤器的思想,并使用估计器学习自动机来有效地确定特征子集,从而在学习算法的准确性和效率之间取得理想的平衡。为了找到可在任意数据集上运行的特殊处理算法的合格子集,ESS使用自动机根据子集的规模和使用该子集的学习算法的准确性对每个候选子集进行评分。最后,将返回得分最高的子集。我们已经在垃圾邮件检测的框架中使用ESS进行功能选择,垃圾邮件检测是一种文本分类任务,用于将电子邮件作为一种普遍的通信介质。结果表明在实现上述目标方面取得了成就。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号