...
首页> 外文期刊>Multimedia Tools and Applications >Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features
【24h】

Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features

机译:使用机器学习分类器和基于文本的功能重新采样不平衡数据以检测假审查

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Fraudulent online sellers often collude with reviewers to garner fake reviews for their products. This act undermines the trust of buyers in product reviews, and potentially reduces the effectiveness of online markets. Being able to accurately detect fake reviews is, therefore, critical. In this study, we investigate several preprocessing and textual-based featuring methods along with machine learning classifiers, including single and ensemble models, to build a fake review detection system. Given the nature of product review data, where the number of fake reviews is far less than that of genuine reviews, we look into the results of each class in detail in addition to the overall results. We recognise from our preliminary analysis that, owing to imbalanced data, there is a high imbalance between the accuracies for different classes (e.g., 1.3% for the fake review class and 99.7% for the genuine review class), despite the overall accuracy looking promising (around 89.7%). We propose two dynamic random sampling techniques that are possible for textual-based featuring methods to solve this class imbalance problem. Our results indicate that both sampling techniques can improve the accuracy of the fake review class-for balanced datasets, the accuracies can be improved to a maximum of 84.5% and 75.6% for random under and over-sampling, respectively. However, the accuracies for genuine reviews decrease to 75% and 58.8% for random under and over-sampling, respectively. We also discover that, for smaller datasets, the Adaptive Boosting ensemble model outperforms other single classifiers; whereas, for larger datasets, the performance improvement from ensemble models is insignificant compared to the best results obtained by single classifiers.
机译:欺诈性在线销售商经常与审稿人员丛及,为他们的产品提供假审查。该法案破坏了产品评论中买家的信任,并可能降低在线市场的有效性。因此,能够准确地检测到虚假评论,是至关重要的。在这项研究中,我们调查了几种预处理和基于文本的特色方法以及机器学习分类器,包括单一和集合模型,构建假审查检测系统。鉴于产品审查数据的性质,虚假审查的数量远远低于真实评论,除了整体结果外,我们还详细介绍了每个课程的结果。我们认识到我们的初步分析,由于数据不平衡,不同类别的准确性(例如,对于假审查课程的1.3%,对于真正审查课程为99.7%),仍有高度不平衡,尽管总体准确性看起来很有希望(约89.7%)。我们提出了两个动态随机采样技术,可以解决基于文本的特色方法来解决此类不平衡问题。我们的结果表明,两种采样技术都可以提高假审查类别的准确性 - 对于平衡数据集,可以分别提高准确度至最多84.5%和75.6%,分别为随机抽样。然而,对于随机的下列和过度采样,真正评论的准确性降低至75%和58.8%。我们还发现,对于较小的数据集,自适应升压集合模型优于其他单个分类器;虽然对于较大的数据集,与单级分类器获得的最佳结果相比,集合模型的性能改善是微不足道的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号