Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.

Zorkadis V; Karras DA; Panayotou M

首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.

【24h】

Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.

机译：用于分类器组合，特征提取和性能评估的有效信息理论策略，可改善垃圾邮件过滤的误报和误报。

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spam emails are considered as a serious privacy-related violation, besides being a costly, unsolicited communication. Various spam filtering techniques have been so far proposed, mainly based on Naive Bayesian algorithms. Other Machine Learning algorithms like Boosting trees, or Support Vector Machines (SVM) have already been used with success. However, the number of False Positives (FP) and False Negatives (FN) resulting through applying various spam e-mail filters still remains too high and the problem of spam e-mail categorization cannot be solved completely from a practical viewpoint. In this paper, we propose a novel approach for spam e-mail filtering based on efficient information theoretic techniques for integrating classifiers, for extracting improved features and for properly evaluating categorization accuracy in terms of FP and FN. The goal of the presented methodology is to empirically but explicitly minimize these FP and FN numbers by combining high-performance FP filters with high-performance FN filters emerging from a previous work of the authors [Zorkadis, V., Panayotou, M., & Karras, D. A. (2005). Improved spam e-mail filtering based on committee machines and information theoretic feature extraction. Proceedings of the International Joint Conference on Neural Networks, July 31-August 4, 2005, Montreal, Canada]. To this end, Random Committee-based filters along with ADTree-based ones are efficiently combined through information theory, respectively. The experiments conducted are of the most extensive ones so far in the literature, exploiting widely accepted benchmarking e-mail data sets and comparing the proposed methodology with the Naive Bayes spam filter as well as with the Boosting tree methodology, the classification via regression and other machine learning models. It is illustrated by means of novel information theoretic measures of FP & FN filtering performance that the proposed approach is very favorably compared to the other rival methods. Finally, it is found that theproposed information theoretic Boolean features present a remarkably high spam categorization performance.

机译：垃圾邮件除了被认为是昂贵的，不请自来的通信之外，还被认为是与隐私相关的严重违规行为。到目前为止，已经提出了多种垃圾邮件过滤技术，主要基于朴素贝叶斯算法。其他机器学习算法（如Boosting树或支持向量机（SVM））已成功使用。但是，通过应用各种垃圾邮件过滤器而导致的误报（FP）和误报（FN）的数量仍然过高，从实际的角度不能完全解决垃圾邮件分类的问题。在本文中，我们提出了一种新的垃圾邮件过滤方法，该方法基于有效的信息理论技术，用于集成分类器，提取改进的特征并正确评估FP和FN的分类准确性。提出的方法的目的是通过结合高性能FP滤波器和作者先前工作中出现的高性能FN滤波器，从经验上但显着地最小化这些FP和FN数[Zorkadis，V.，Panayotou，M.，＆卡拉斯（DA）（2005）。基于委员会机器和信息理论特征提取的改进的垃圾邮件筛选。国际神经网络联合会议论文集，2005年7月31日至8月4日，加拿大蒙特利尔]。为此，分别通过信息理论有效地组合了基于随机委员会的过滤器和基于ADTree的过滤器。所进行的实验是迄今为止文献中最广泛的实验，它利用了广为接受的基准电子邮件数据集，并将拟议的方法与Naive Bayes垃圾邮件过滤器以及Boosting树方法，通过回归进行分类和其他方法进行了比较。机器学习模型。 FP和FN滤波性能的新颖信息理论方法表明，与其他竞争方法相比，该方法非常有利。最后，发现建议的信息理论布尔特征呈现出非常高的垃圾邮件分类性能。

著录项

来源
《Neural Networks: The Official Journal of the International Neural Network Society》 |2005年第6期|共9页
作者
Zorkadis V; Karras DA; Panayotou M;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类基础医学;
关键词
Classification; Electronic Mail; 分类法;

机译：Classification;Electronic Mail;分类法;

相似文献

外文文献
中文文献
专利

1. Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering. [J] . Zorkadis V, Karras DA, Panayotou M Neural Networks: The Official Journal of the International Neural Network Society . 2005,第5a6期

机译：用于分类器组合，特征提取和性能评估的有效信息理论策略，可改善垃圾邮件过滤的误报和误报。
2. False-Positive papanicolaou (pap) test rates in the college of american pathologists pap education and pap proficiency test programs : Evaluation of False-Positive responses of high-grade squamous intraepithelial lesion or cancer to a negative reference diagnosis [J] . CrothersB.A., BoothC.N., DarraghT.M., Archives of pathology & laboratory medicine . 2014,第5期

机译：美国病理学家学院的apache假阳性papanicolaou（pap）检定率和pap熟练程度测试计划：高级别鳞状上皮内病变或癌症对阴性参考诊断的假阳性反应的评估
3. Ultrasonographic features associated with false-negative and false-positive results of extrathyroidal extensions in papillary thyroid microcarcinoma [J] . Lee Young Chan, Jung Ah Ra, Sohn Yu-Mee, European archives of oto-rhino-laryngology: Official journal of the European Federation of Oto-Rhino-Laryngological Societies (EUFOS) . 2018,第11期

机译：与乳头状甲状腺微癌脱滴虫延伸的假阴性和假阳性结果相关的超声波特征
4. Spam Filtering Issue: FPD Research between False Positive and False Negative [C] . Liu Zhen, Zhou Ming-Tian, FSKD International Conference on Fuzzy Systems and Knowledge Discovery . 2007

机译：垃圾邮件过滤问题：错误正面和假阴性之间的FPD研究
5. Feature selection strategies for spam e-mail filtering. [D] . Wang, Ren. 2006

机译：垃圾邮件过滤的功能选择策略。
6. Causes and imaging features of false positives and false negatives on 18F-PET/CT in oncologic imaging [O] . Niamh M. Long, Clare S. Smith 2011

机译：肿瘤影像学中18F-PET / CT假阳性和假阴性的原因和影像学特征
7. Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates [O] . J. Saketha Nath, C. Bhattacharyya 2007

机译：具有指定的假阳性和假阴性错误率的最大边际分类器

Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering.

摘要

著录项

相似文献

相关主题

期刊订阅