Distinguishing between authentic and fictitious user-generated hotel reviews

机译：区分真实的和虚构的用户生成的酒店评论

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The objective of this paper is to distinguish between authentic and fictitious user-generated hotel reviews. To achieve this objective, it adopts a two-step approach. The first seeks to classify authentic and fictitious reviews by leveraging on their possible textual differences. The second step attempts to identify the textual traits that are unique to authentic and fictitious reviews. For the purpose of this paper, a ground truth dataset of 1,800 reviews, uniformly divided between authentic and fictitious, was created. With respect to the first step, authentic and fictitious reviews were classified by using four forms of textual differences: understandability, level of details, writing style, and cognition indicators. Classification was performed using voting by average probability among logistic regression, C4.5, Support Vector Machine, JRip, and Random Forest classifiers. Using five-fold cross-validation, the proposed approach was found to outperform two existing baselines. Furthermore, with respect to the second step, the textual traits unique to authentic and fictitious reviews were identified using Information Gain, and Chi-squared feature selection techniques. A sequential forward feature selection approach was further adopted to identify the top five features that aid the classification of authentic and fictitious reviews. These include the use of nouns, articles, function words, punctuations, and in particular, exclamation points in reviews. The implications of the results are discussed.

机译：本文的目的是区分真实的和虚构的用户生成的酒店评论。为了实现这一目标，它采用了两步法。第一种方法是通过利用可能存在的文字差异来对真实的和虚构的评论进行分类。第二步试图确定真实和虚拟评论所特有的文字特征。出于本文的目的，创建了一个由1,800条评论组成的地面真实数据集，该数据集在真实与虚构之间进行了统一划分。关于第一步，使用四种形式的文本差异对真实和虚构的评论进行分类：可理解性，详细程度，写作风格和认知指标。使用logistic回归，C4.5，支持向量机，JRip和随机森林分类器中的平均概率投票进行分类。使用五重交叉验证，发现所提出的方法优于两个现有基准。此外，关于第二步，使用信息增益和卡方特征选择技术来识别真实和虚拟评论所独有的文本特征。进一步采用了顺序前进特征选择方法来确定有助于对真实和虚拟评论进行分类的前五项特征。其中包括名词，文章，功能词，标点符号的使用，尤其是评论中的感叹号。讨论了结果的含义。

著录项

来源
《International Conference on Computing Communication and Networking Technologies》|2015年|1-7|共7页
会议地点
作者
Snehasish Banerjee; Alton Y. K. Chua; Jung-Jae Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
classification algorithms; data mining; machine learning; text analysis;

机译：分类算法数据挖掘机器学习文本分析;

相似文献

外文文献
中文文献
专利

1. Authentic versus fictitious online reviews: A textual analysis across luxury,budget, and mid-range hotels [J] . Snehasish Banerjee, Alton Y.K. Chua Journal of Information Science . 2017,第1期

机译：真实与虚构的在线评论：豪华，经济型和中档酒店的文字分析
2. Roles of negative emotions in customers' perceived helpfulness of hotel reviews on a user-generated review website A text mining approach [J] . Lee Minwoo, Jeong Miyoung, Lee Jongseo International Journal of Contemporary Hospitality Management . 2017,第2期

机译：负面情绪在顾客对用户评价网站上的酒店评价的感知帮助中的作用文本挖掘方法
3. Big data for big insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews [J] . Liu Yong, Teichert Thorsten, Rossi Matti, Tourism management . 2017,第APRa期

机译：大数据，大洞察力：通过412,784个用户生成的评论调查特定语言的酒店满意度驱动因素
4. Distinguishing between Authentic and Fictitious User-generated Hotel Reviews [C] . Snehasish Banerjee, Alton Y. K. Chua, Jung-Jae Kim International Conference on Computing, Communications and Networking Technologies . 2015

机译：区分真实和虚构的用户生成的酒店评论
5. Enriching user and item profiles for collaborative filtering: From concept hierarchies to user-generated reviews. [D] . Leung, Wing Ki Cane. 2009

机译：丰富用户和项目配置文件以进行协作过滤：从概念层次结构到用户生成的评论。
6. Transcription factor competition allows embryonic stem cells to distinguish authentic signals from noise [O] . Cameron Sokolik, Yanxia Liu, David Bauer, -1

机译：转录因子竞争使胚胎干细胞能够将真实信号与噪声区分开
7. Study of authentic and fictitious online reviews [O] . Banerjee Snehasish -1

机译：研究真实与虚构在线评论

Distinguishing between authentic and fictitious user-generated hotel reviews

摘要

著录项

相似文献

相关主题

期刊订阅