首页> 中文期刊> 《电子科学学刊(英文版)》 >AN EFFICIENT APPROACH TO COMMENT SPAM IDENTIFICATION

AN EFFICIENT APPROACH TO COMMENT SPAM IDENTIFICATION

             

摘要

This paper proposes a novel approach to comment spam identification based on content analysis. Three main features including the number of links, content repetitiveness, and text similarity are used for comment spam identification. In practice, content repetitiveness is determined by the length and frequency of the longest common substring. Furthermore, text similarity is calculated using vector space model. The precisions of preliminary experiments on comment spam identification conducted on Chinese and English are as high as 93% and 82% respectively. The results show the validity and language independency of this approach. Compared with conventional spam filtering approaches, our method requires no training, no rule sets and no link relationships. The proposed approach can also deal with new comments as well as existing comments.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号