Spam is a key problem in electronic communication. The increasing volume of spam has become a serious threat not only to the Internet, but also to society. Content-based filtering is one mainstream method of combating this threat in its various forms, but the previous Content-based filtering methods are hard to find a balance between efficiency and effectiveness. In this paper we intend to seek a linear solve for this problem, and two online linear classifiers: the Perceptron and Winnow are explored for this task in three benchmark corpora, which include English corpus PU1, Lingspam and Chinese corpus 2005-Jun, Our experiments conclude that both of these classifiers can filter spam emails effectively as well as efficiently. It is also show that they perform much better than a standard Naïve Bayes method. In fact, to the best of our knowledge, they have a state-of-the-art performance for filtering Chinese spam emails, at least on the above corpora. Furthermore, both of the two classifiers are easily adaptively updated, thus are suitable for real dynamic environment.
展开▼