iSRD: Spam review detection with imbalanced data distributions

机译：iSRD：具有不平衡数据分布的垃圾邮件审查检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Internet is playing an essential role for modern information systems. Applications, such as e-commerce websites, are becoming popularly available for people to purchase different types of products online. During such an online shopping process, users often rely on online review reports from previous customers to make the final decision. Because online reviews are playing essential roles for the selling of online products (or services), some vendors (or customers) are providing fake/spam reviews to mislead the customers. Any false reviews of the products may result in unfair market competition and financial loss for the customers or vendors. In this research, we aim to distinguish between spam and non-spam reviews by using supervised classification methods. When training a classifier to identify spam vs. non-spam reviews, a challenging issue is that spam reviews are only a very small portion of the online review reports. This naturally leads to a data imbalance issue for training classifiers for spam review detection, where learning methods without emphasizing on minority samples (i.e., spams) may result in poor performance in detecting spam reviews (although the overall accuracy of the algorithm might be relatively high). In order to tackle the challenge, we employ a bagging based approach to build a number of balanced datasets, through which we can train a set of spam classifiers and use their ensemble to detect review spams. Experiments and comparisons demonstrate that our method, iSRD, outperforms baseline methods for review spam detection.

机译：互联网对现代信息系统起着至关重要的作用。诸如电子商务网站之类的应用程序正变得越来越流行，人们可以在线购买不同类型的产品。在这样的在线购物过程中，用户经常依靠以前客户的在线评论报告来做出最终决定。由于在线评论在销售在线产品（或服务）中起着至关重要的作用，因此一些供应商（或客户）提供虚假/垃圾邮件评论以误导客户。对产品的任何错误评价都可能导致不公平的市场竞争，并给客户或供应商造成财务损失。在这项研究中，我们旨在通过使用监督分类方法来区分垃圾邮件评论和非垃圾邮件评论。在训练分类器来识别垃圾邮件和非垃圾邮件评论时，一个具有挑战性的问题是，垃圾评论仅是在线评论报告的一小部分。这自然会导致针对垃圾邮件审查检测的训练分类器的数据不平衡问题，其中不强调少数样本（即垃圾邮件）的学习方法可能会导致检测垃圾邮件审查的性能较差（尽管算法的整体准确性可能相对较高））。为了解决挑战，我们采用了基于袋装的方法来构建许多平衡的数据集，通过这些方法，我们可以训练一组垃圾邮件分类器，并使用它们的集合来检测垃圾评论。实验和比较表明，我们的方法iSRD在检测垃圾邮件方面优于基线方法。

著录项

来源
《IEEE International Conference on Information Reuse and Integration;Workshop on information reuse and integration in health informatics;IEEE international workshop on formal methods integration;IEEE international workshop on data integration and mining;IEEE international workshop on empirical methods for recognizing inference in text;International workshop on issues and challenges in social computing;Workshop on social network security;International workshop on information integration in cyber physical systems;Workshop on advances in nature-inspired information security: Science, engineering and economics》|2014年|553-560|共8页
会议地点 Redwood City CA(US)
作者
Al Najada Hamzah; Xingquan Zhu;
展开▼
作者单位

Dept. of Comput. Electr. Eng. Comput. Sci. Florida Atlantic Univ. Boca Raton FL USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Data sampling; classification; fake reviews; imbalanced data distributions; sentiment analysis;

机译：数据采样；分类;虚假评论；数据分配不平衡；情绪分析;

相似文献

外文文献
中文文献
专利

1. Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution [J] . Jafar Alqatawna, Hossam Faris, Khalid Jaradat, International journal of communications, network, and system sciences . 2015,第5期

机译：改进基于知识的垃圾邮件检测方法：恶意相关功能对不平衡数据分发的影响
2. Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution [J] . Jafar Alqatawna, Hossam Faris, Khalid Jaradat, International Journal of Communications, Network and System Sciences . 2015,第5期

机译：改进基于知识的垃圾邮件检测方法：恶意相关功能对不平衡数据分发的影响
3. Effective Opinion Spam Detection: A Study on Review Metadata Versus Content [J] . Ajay Rastogi, Monica Mehrotra, Syed Shafat Ali Journal of Data and Information Science . 2020,第2期

机译：有效意见垃圾邮件检测：关于审查元数据与内容的研究
4. iSRD: Spam review detection with imbalanced data distributions [C] . Al Najada Hamzah, Xingquan Zhu IEEE International Conference on Information Reuse and Integration . 2014

机译：ISRD：垃圾邮件评论检测数据分布不平衡
5. Text classification on imbalanced data: Application to systematic reviews automation. [D] . Ma, Yimin. 2007

机译：不平衡数据的文本分类：在系统评价自动化中的应用。
6. The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data [O] . Richard A. Bauder, Taghi M. Khoshgoftaar 2018

机译：大数据分布不均衡对医疗保险欺诈检测中学习者行为的影响
7. Effective Opinion Spam Detection: A Study on Review Metadata Versus Content [O] . Ajay Rastogi, Monica Mehrotra, Syed Shafat Ali 2020

机译：有效意见垃圾邮件检测：关于审查元数据与内容的研究

iSRD: Spam review detection with imbalanced data distributions

摘要

著录项

相似文献

相关主题

期刊订阅