【24h】

iSRD: Spam review detection with imbalanced data distributions

机译:iSRD:具有不平衡数据分布的垃圾邮件审查检测

获取原文

摘要

Internet is playing an essential role for modern information systems. Applications, such as e-commerce websites, are becoming popularly available for people to purchase different types of products online. During such an online shopping process, users often rely on online review reports from previous customers to make the final decision. Because online reviews are playing essential roles for the selling of online products (or services), some vendors (or customers) are providing fake/spam reviews to mislead the customers. Any false reviews of the products may result in unfair market competition and financial loss for the customers or vendors. In this research, we aim to distinguish between spam and non-spam reviews by using supervised classification methods. When training a classifier to identify spam vs. non-spam reviews, a challenging issue is that spam reviews are only a very small portion of the online review reports. This naturally leads to a data imbalance issue for training classifiers for spam review detection, where learning methods without emphasizing on minority samples (i.e., spams) may result in poor performance in detecting spam reviews (although the overall accuracy of the algorithm might be relatively high). In order to tackle the challenge, we employ a bagging based approach to build a number of balanced datasets, through which we can train a set of spam classifiers and use their ensemble to detect review spams. Experiments and comparisons demonstrate that our method, iSRD, outperforms baseline methods for review spam detection.
机译:互联网对现代信息系统起着至关重要的作用。诸如电子商务网站之类的应用程序正变得越来越流行,人们可以在线购买不同类型的产品。在这样的在线购物过程中,用户经常依靠以前客户的在线评论报告来做出最终决定。由于在线评论在销售在线产品(或服务)中起着至关重要的作用,因此一些供应商(或客户)提供虚假/垃圾邮件评论以误导客户。对产品的任何错误评价都可能导致不公平的市场竞争,并给客户或供应商造成财务损失。在这项研究中,我们旨在通过使用监督分类方法来区分垃圾邮件评论和非垃圾邮件评论。在训练分类器来识别垃圾邮件和非垃圾邮件评论时,一个具有挑战性的问题是,垃圾评论仅是在线评论报告的一小部分。这自然会导致针对垃圾邮件审查检测的训练分类器的数据不平衡问题,其中不强调少数样本(即垃圾邮件)的学习方法可能会导致检测垃圾邮件审查的性能较差(尽管算法的整体准确性可能相对较高) )。为了解决挑战,我们采用了基于袋装的方法来构建许多平衡的数据集,通过这些方法,我们可以训练一组垃圾邮件分类器,并使用它们的集合来检测垃圾评论。实验和比较表明,我们的方法iSRD在检测垃圾邮件方面优于基线方法。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号