首页> 外文会议>International Conference on Informatics and Computing >Sentiment Analysis Of Hotel Review Using N-Gram And Naive Bayes Methods
【24h】

Sentiment Analysis Of Hotel Review Using N-Gram And Naive Bayes Methods

机译:基于N-Gram和朴素贝叶斯方法的酒店评价情感分析

获取原文

摘要

Hotel booking service providers in the form of websites or online-based applications have provided features where consumers can provide a review regarding their assessment of the hotel. But the number of reviews available makes users unable to filter out all the reviews. Sentiment analysis can be used as a solution to overcome this by classifying reviews into positive or negative sentiments. This study aims to determine the application of n-gram and naive bayes methods in sentiment analysis classification. The research phase includes: (1) the hotel review data collection was obtained from the TripAdvisor.com website, (2) the data preprocessing process is data cleaning and case folding, (3) the process of tokenization using the n-gram method consists unigram, bigram, and trigram, (4) the process of word weighting using Term Frequence Inverse-Document Frequency (TF-IDF) method, (5) the process of classification using the Naive Bayes method to classify hotel reviews to be positive or negative, (6) the evaluation process to determine the results of the performance of the algorithm using a confusion matrix that will produce the value of precision, recall, accuracy and error rate. Based on these results, classification using the Naive Bayes and unigram methods obtained precision results of 94%, recall 100%, accuracy 97% and error rate 3%. The bigram methods obtained precision results of 89%, recall 94%, accuracy 92% and error rate 8%. The trigram methods obtained precision results of 52%, recall 80%, accuracy 58% and error rate 42%. Based on accuracy results, It can be concluded that tokenization unigram method better than other tokenization methods.
机译:以网站或基于在线应用程序的形式提供的酒店预订服务提供商提供了一些功能,消费者可以在其中提供有关其对酒店评价的评论。但是可用的评论数量使用户无法过滤掉所有评论。通过将评论分类为正面或负面情绪,可以将情绪分析用作解决此问题的解决方案。本研究旨在确定n-gram和朴素贝叶斯方法在情感分析分类中的应用。研究阶段包括:(1)从TripAdvisor.com网站获取酒店评论数据,(2)数据预处理过程是数据清理和案例折叠,(3)使用n-gram方法的标记化过程包括unigram,bigram和trigram,(4)使用词频逆文档频率(TF-IDF)方法进行词加权的过程,(5)使用Naive Bayes方法进行分类的酒店分类为正面还是负面的过程,(6)评估过程中使用混淆矩阵确定算法性能的结果,该矩阵将产生精度,查全率,准确性和错误率的值。根据这些结果,使用朴素贝叶斯(Naive Bayes)和unigram方法进行分类可获得的准确度结果为94%,召回率100%,准确度97%和错误率3%。 bigram方法获得89%的准确率,94%的查全率,92%的准确率和8%的错误率。 Trigram方法获得的准确度结果为52%,召回率80%,准确度58%和错误率42%。根据准确度结果,可以断定令牌化单字组方法比其他令牌化方法更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号