首页> 外文期刊>Expert Systems with Application >Semi-supervised learning using frequent itemset and ensemble learning for SMS classification
【24h】

Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

机译:使用频繁项目集和集成学习进行SMS分类的半监督学习

获取原文
获取原文并翻译 | 示例

摘要

Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has gotten more attention of researchers in recent times and is treated with a number of different machine learning approaches. Supervised machine learning approaches, used so far, demands a large amount of labeled data which is not always available in real applications. The traditional semi-supervised methods can alleviate this problem but may not produce good results if they are provided with only positive and unla-beled data. In this paper, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning (FIEL) to overcome this limitation. In this approach, Apriori algorithm has been used for finding the frequent itemset while Multinomial Naive Bayes, Random Forest and LibSVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach works well with small number of positive data and different amounts of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS spam collection data set, SMS spam collection Corpus v.0.1 Small and Big which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results show that our approach is more stable than the compared approaches with minimum support.
机译:由于移动用户的快速增长以及易于使用的操作机制,短消息服务(SMS)已成为最重要的通信媒体之一。大量的SMS伴随着由虚假用户生成的垃圾短信问题。垃圾邮件SMS的检测近来已引起研究人员的越来越多的关注,并且已通过多种不同的机器学习方法进行了处理。迄今为止使用的监督式机器学习方法需要大量的标记数据,而这些数据在实际应用中并不总是可用。传统的半监督方法可以缓解此问题,但如果仅提供正数和无条件的数据,则可能不会产生良好的结果。在本文中,我们提出了一种新颖的半监督学习方法,该方法利用频繁项集和集成学习(FIEL)来克服此限制。在这种方法中,Apriori算法已用于查找频繁项集,而多项式朴素贝叶斯,随机森林和LibSVM被用作使用多数投票方案的集成学习的基础学习者。我们提出的方法适用于少量阳性数据和不同数量的未标记数据集,且准确性更高。已对UCI SMS垃圾邮件收集数据集,SMS垃圾邮件收集语料库v.0.1 Small和Big进行了广泛的实验,这些实验显示了非常少量的积极数据,大大提高了准确性。我们将我们提出的FIEL方法与现有的SPY-EM和PEBL方法进行了比较,结果表明,与使用最少支持的比较方法相比,我们的方法更加稳定。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号