A novel committee selection mechanism for combining classifiers to detect unsolicited emails

Shrawan Kumar Trivedi; Shubhamoy Dey

首页> 外文期刊>VINE journal of information and knowledge management systems >A novel committee selection mechanism for combining classifiers to detect unsolicited emails

【24h】

A novel committee selection mechanism for combining classifiers to detect unsolicited emails

机译：一种新颖的委员会选择机制，用于组合分类器以检测未经请求的电子邮件

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Purpose - The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam. Design/methodology/approach - For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers. Findings - For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Na?ve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Na?ve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate. Research limitations/implications - This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study. Practical implications - This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate. Originality/value - The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

机译：目的-电子邮件是快速共享信息的重要媒介。但是，垃圾邮件是此类通信中的麻烦事，它促使构建具有高分类精度和对误报的良好敏感性的健壮的过滤系统。在这种情况下，本文旨在提出一种使用委员会选择机制的组合分类器技术，其主要目的是识别一组分类器，以便可以通过委员会选择程序组合其单独决策，以准确检测垃圾邮件。设计/方法/方法-为了对相关的机器学习分类器进行训练和测试，本研究中使用了文本挖掘方法。三个数据集（Enron，SpamAssassin和LingSpam）已用于测试分类器。最初，执行预处理以提取与电子邮件文件关联的功能。在下一步中，通过降维方法提取提取的特征，其中去除非信息性特征。随后，使用遗传特征搜索选择信息丰富的子集。此后，对建议的分类器进行测试，以提供这些信息，并将结果与其他分类器进行比较。调查结果-为了构建建议的组合分类器，已进行了三项不同的研究。第一项研究确定了提升算法对两个概率分类器的影响：贝叶斯和朴素贝叶斯。在该研究中，发现AdaBoost是提高性能的最佳算法。第二项研究是关于不同的内核功能对支持向量机（SVM）分类器的影响，其中以标准化多项式（NP）内核为支持向量的SVM最好。上一项研究是将分类器与委员会选择相结合，其中委员会成员是第一项研究（即贝叶斯和朴素海湾与AdaBoost）确定的最佳分类器，而委员会主席是从第二项研究（即具有NP内核的SVM）中选出的。结果表明，将识别出的分类器组合成委员会机器可提供出色的性能精度，且误报率低。研究的局限性/含义-该研究的重点是用英语编写的电子邮件垃圾邮件的分类。仅使用了电子邮件的正文（文本）部分。图片垃圾邮件未包含在此项工作中。我们的工作仅限于通过电子邮件发送。这项研究没有其他任何类型的消息，如短消息服务或多媒体消息服务。实际意义-这项研究提出了一种解决互联网服务提供商和使用电子邮件的组织所面临的问题和挑战的方法。所提出的模型不仅提供了更好的分类准确性，而且还提供了较低的误报率。原创性/价值-提出的组合分类器是一种新颖分类器，旨在对电子邮件垃圾邮件进行准确分类。

著录项

来源
《VINE journal of information and knowledge management systems》 |2016年第4期|共25页
作者
Shrawan Kumar Trivedi; Shubhamoy Dey;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆自动化、网络化;
关键词
SVM; Bayesian; Probabilistic classifiers; Na?ve Bayes; Function-based classifiers; Kernel functions; Combining classifiers; Stacking; Committee machine;

机译：支持向量机;贝叶斯;概率分类器;朴素贝叶斯;基于函数的分类器;内核函数;组合分类器;堆叠;委员会机;

相似文献

外文文献
中文文献
专利

1. A novel committee selection mechanism for combining classifiers to detect unsolicited emails [J] . Shrawan Kumar Trivedi, Shubhamoy Dey VINE journal of information and knowledge management systems . 2016,第4期

机译：一种新颖的委员会选择机制，用于组合分类器以检测未经请求的电子邮件
2. We’re Here for You: The Unsolicited Covid-19 Email [J] . Kristin Winet, Ryan L. Winet Journal of Business and Technical Communication . 2021,第1期

机译：我们在这里为您：未经请求的Covid-19电子邮件
3. A modified content-based evolutionary approach to identify unsolicited emails [J] . Trivedi Shrawan Kumar, Dey Shubhamoy Knowledge and information systems . 2019,第3期

机译：基于修改的内容的进化方法来识别未经请求的电子邮件
4. A Combining Classifiers Approach for Detecting Email Spams [C] . Shrawan Kumar Trivedi, Shubhamoy Dey IEEE International Conference on Advanced Information Networking and Applications . 2016

机译：用于检测电子邮件垃圾邮件的组合分类器方法
5. Exploring the use of unsolicited email in EFL education in Taiwan: Authentic and critical literacy in context [D] . Li, Pei Fen 2007

机译：探索台湾EFL教育中不请自来的电子邮件的使用：上下文中的真实和批判性素养
6. Automatic replies can be sent to unsolicited email from general public [O] . Christopher Oliver 1999

机译：可以将自动回复发送给来自公众的未经请求的电子邮件
7. A Study of Bayesian Classifiers Detecting Gratuitous Email Spamming [O] . Garima Jain 2016

机译：贝叶斯分类器检测无偿电子邮件垃圾邮件的研究

A novel committee selection mechanism for combining classifiers to detect unsolicited emails

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅