首页> 外文会议>International Conference on Computing Communication and Networking Technologies >Distinguishing between authentic and fictitious user-generated hotel reviews
【24h】

Distinguishing between authentic and fictitious user-generated hotel reviews

机译:区分真实的和虚构的用户生成的酒店评论

获取原文

摘要

The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose of this paper, a ground truth dataset of 1,800 reviews, uniformly divided between authentic and fictitious, was created. With respect to the first step, authentic and fictitious reviews were classified by using four forms of textual differences: understandability, level of details, writing style, and cognition indicators. Classification was performed using voting by average probability among logistic regression, C4.5, Support Vector Machine, JRip, and Random Forest classifiers. Using five-fold cross-validation, the proposed approach was found to outperform two existing baselines. Furthermore, with respect to the second step, the textual traits unique to authentic and fictitious reviews were identified using Information Gain, and Chi-squared feature selection techniques. A sequential forward feature selection approach was further adopted to identify the top five features that aid the classification of authentic and fictitious reviews. These include the use of nouns, articles, function words, punctuations, and in particular, exclamation points in reviews. The implications of the results are discussed.
机译:本文的目的是区分真实的和虚构的用户生成的酒店评论。为了实现这一目标,它采用了两步法。第一种方法是通过利用可能存在的文字差异来对真实的和虚构的评论进行分类。第二步试图确定真实和虚拟评论所特有的文字特征。出于本文的目的,创建了一个由1,800条评论组成的地面真实数据集,该数据集在真实与虚构之间进行了统一划分。关于第一步,使用四种形式的文本差异对真实和虚构的评论进行分类:可理解性,详细程度,写作风格和认知指标。使用logistic回归,C4.5,支持向量机,JRip和随机森林分类器中的平均概率投票进行分类。使用五重交叉验证,发现所提出的方法优于两个现有基准。此外,关于第二步,使用信息增益和卡方特征选择技术来识别真实和虚拟评论所独有的文本特征。进一步采用了顺序前进特征选择方法来确定有助于对真实和虚拟评论进行分类的前五项特征。其中包括名词,文章,功能词,标点符号的使用,尤其是评论中的感叹号。讨论了结果的含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号