首页> 外文期刊>Indonesian Journal of Computing and Cybernetics Systems >Comparison of Filter and Wrapper Based Feature Selection Methods on Spam Comment Classification
【24h】

Comparison of Filter and Wrapper Based Feature Selection Methods on Spam Comment Classification

机译:基于滤波器和包装器的特征选择方法对垃圾邮件评论分类的比较

获取原文
           

摘要

The continuous growth of the internet has led to the use of social media for various purposes increase. For instance, some irresponsible parties take advantage of the comment feature on social media platforms to harm others by providing spam comments on the shared object. Furthermore, variation of comments creates many features to be processed, thereby negatively impacting the performance of a classification algorithm. Therefore, this study aims to solve the problem associated with spam comments by comparing filter and wrapper based feature selection using text classification techniques. Data collected from training and test data of 4944 and 100 comments showed that the best accuracy, precision, recall, and f-measure of MNB are 96%, 100%, 92%, and 95.8%. The best accuracy is achieved using feature selection by combining Chi-Square and Sequential Forward Selection methods with a subset of 500 features. Furthermore, the accuracy increase in the MNB and SVM classifications are 8% and 4%. This research concludes that the combination of feature selection improves the classification performance of Indonesian language spam comments.
机译:互联网的持续增长导致了社交媒体的使用,以增加各种目的。例如,某些不负责任的各方利用社交媒体平台上的评论功能来通过为共享对象提供垃圾评论来伤害他人。此外,评论的变化会产生许多要处理的特征,从而对分类算法的性能产生负面影响。因此,本研究旨在通过使用文本分类技术比较滤波器和包装器的特征选择来解决与垃圾评论相关的问题。从培训和测试数据收集的数据为4944和100评论表明,MNB的最佳精度,精度,召回和F测量值为96%,100%,92%和95.8%。使用具有500个功能子集的Chi-Square和顺序前进选择方法来实现最佳精度。此外,MNB和SVM分类的精度增加为8%和4%。这项研究得出结论,特征选择的组合可以提高印度尼西亚语言垃圾邮件评论的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号