首页> 外文期刊>VINE journal of information and knowledge management systems >A novel committee selection mechanism for combining classifiers to detect unsolicited emails
【24h】

A novel committee selection mechanism for combining classifiers to detect unsolicited emails

机译:一种新颖的委员会选择机制,用于组合分类器以检测未经请求的电子邮件

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Purpose - The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam. Design/methodology/approach - For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers. Findings - For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Na?ve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Na?ve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate. Research limitations/implications - This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study. Practical implications - This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate. Originality/value - The proposed combined classifier is a novel classifier designed for accurate classification of email spam.
机译:目的-电子邮件是快速共享信息的重要媒介。但是,垃圾邮件是此类通信中的麻烦事,它促使构建具有高分类精度和对误报的良好敏感性的健壮的过滤系统。在这种情况下,本文旨在提出一种使用委员会选择机制的组合分类器技术,其主要目的是识别一组分类器,以便可以通过委员会选择程序组合其单独决策,以准确检测垃圾邮件。设计/方法/方法-为了对相关的机器学习分类器进行训练和测试,本研究中使用了文本挖掘方法。三个数据集(Enron,SpamAssassin和LingSpam)已用于测试分类器。最初,执行预处理以提取与电子邮件文件关联的功能。在下一步中,通过降维方法提取提取的特征,其中去除非信息性特征。随后,使用遗传特征搜索选择信息丰富的子集。此后,对建议的分类器进行测试,以提供这些信息,并将结果与​​其他分类器进行比较。调查结果-为了构建建议的组合分类器,已进行了三项不同的研究。第一项研究确定了提升算法对两个概率分类器的影响:贝叶斯和朴素贝叶斯。在该研究中,发现AdaBoost是提高性能的最佳算法。第二项研究是关于不同的内核功能对支持向量机(SVM)分类器的影响,其中以标准化多项式(NP)内核为支持向量的SVM最好。上一项研究是将分类器与委员会选择相结合,其中委员会成员是第一项研究(即贝叶斯和朴素海湾与AdaBoost)确定的最佳分类器,而委员会主席是从第二项研究(即具有NP内核的SVM)中选出的。结果表明,将识别出的分类器组合成委员会机器可提供出色的性能精度,且误报率低。研究的局限性/含义-该研究的重点是用英语编写的电子邮件垃圾邮件的分类。仅使用了电子邮件的正文(文本)部分。图片垃圾邮件未包含在此项工作中。我们的工作仅限于通过电子邮件发送。这项研究没有其他任何类型的消息,如短消息服务或多媒体消息服务。实际意义-这项研究提出了一种解决互联网服务提供商和使用电子邮件的组织所面临的问题和挑战的方法。所提出的模型不仅提供了更好的分类准确性,而且还提供了较低的误报率。原创性/价值-提出的组合分类器是一种新颖分类器,旨在对电子邮件垃圾邮件进行准确分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号