Improving email filtering (Ham vs. Spam emails) is a very important process. The objective of this paper is to increase the filtering accuracy and to decrease the processing time. It discusses different scenarios for Principal Component Analysis-Document Reconstruction (PCADR) classifier implemented for email filtering process The study highlights on the variation in the accuracy of a PCADR classifier with respect to the variation in feature preprocessing. Four scenarios were considered: Scenario 1: Ham and Spam classes are represented with different features. Scenario 2: Ham and Spam classes are represented with same features. Scenario 3: Ham and Spam classes are represented with common terms. Scenario 4: Ham and Spam classes are represented with common Features and Characteristic terms. Different experiments were done using a public corpus extracted from the University of California-Irvine Machine Learning Repository. Different training and test sets were used. A comparison of PCADR with Support Vector Machine and Bayes detector was done to prove its superior behavior.
展开▼