首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.
【24h】

Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.

机译:用于分类器组合,特征提取和性能评估的有效信息理论策略,可改善垃圾邮件过滤的误报和误报。

获取原文
获取原文并翻译 | 示例
           

摘要

Spam emails are considered as a serious privacy-related violation, besides being a costly, unsolicited communication. Various spam filtering techniques have been so far proposed, mainly based on Naive Bayesian algorithms. Other Machine Learning algorithms like Boosting trees, or Support Vector Machines (SVM) have already been used with success. However, the number of False Positives (FP) and False Negatives (FN) resulting through applying various spam e-mail filters still remains too high and the problem of spam e-mail categorization cannot be solved completely from a practical viewpoint. In this paper, we propose a novel approach for spam e-mail filtering based on efficient information theoretic techniques for integrating classifiers, for extracting improved features and for properly evaluating categorization accuracy in terms of FP and FN. The goal of the presented methodology is to empirically but explicitly minimize these FP and FN numbers by combining high-performance FP filters with high-performance FN filters emerging from a previous work of the authors [Zorkadis, V., Panayotou, M., & Karras, D. A. (2005). Improved spam e-mail filtering based on committee machines and information theoretic feature extraction. Proceedings of the International Joint Conference on Neural Networks, July 31-August 4, 2005, Montreal, Canada]. To this end, Random Committee-based filters along with ADTree-based ones are efficiently combined through information theory, respectively. The experiments conducted are of the most extensive ones so far in the literature, exploiting widely accepted benchmarking e-mail data sets and comparing the proposed methodology with the Naive Bayes spam filter as well as with the Boosting tree methodology, the classification via regression and other machine learning models. It is illustrated by means of novel information theoretic measures of FP & FN filtering performance that the proposed approach is very favorably compared to the other rival methods. Finally, it is found that theproposed information theoretic Boolean features present a remarkably high spam categorization performance.
机译:垃圾邮件除了被认为是昂贵的,不请自来的通信之外,还被认为是与隐私相关的严重违规行为。到目前为止,已经提出了多种垃圾邮件过滤技术,主要基于朴素贝叶斯算法。其他机器学习算法(如Boosting树或支持向量机(SVM))已成功使用。但是,通过应用各种垃圾邮件过滤器而导致的误报(FP)和误报(FN)的数量仍然过高,从实际的角度不能完全解决垃圾邮件分类的问题。在本文中,我们提出了一种新的垃圾邮件过滤方法,该方法基于有效的信息理论技术,用于集成分类器,提取改进的特征并正确评估FP和FN的分类准确性。提出的方法的目的是通过结合高性能FP滤波器和作者先前工作中出现的高性能FN滤波器,从经验上但显着地最小化这些FP和FN数[Zorkadis,V.,Panayotou,M.,&卡拉斯(DA)(2005)。基于委员会机器和信息理论特征提取的改进的垃圾邮件筛选。国际神经网络联合会议论文集,2005年7月31日至8月4日,加拿大蒙特利尔]。为此,分别通过信息理论有效地组合了基于随机委员会的过滤器和基于ADTree的过滤器。所进行的实验是迄今为止文献中最广泛的实验,它利用了广为接受的基准电子邮件数据集,并将拟议的方法与Naive Bayes垃圾邮件过滤器以及Boosting树方法,通过回归进行分类和其他方法进行了比较。机器学习模型。 FP和FN滤波性能的新颖信息理论方法表明,与其他竞争方法相比,该方法非常有利。最后,发现建议的信息理论布尔特征呈现出非常高的垃圾邮件分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号