首页> 外文期刊>Procedia Computer Science >Word2Vec for Indonesian Sentiment Analysis towards Hotel Reviews: An Evaluation Study
【24h】

Word2Vec for Indonesian Sentiment Analysis towards Hotel Reviews: An Evaluation Study

机译:Word2Vec用于酒店评论的印尼情绪分析:一项评估研究

获取原文
           

摘要

Hand-crafted features engineering is a labor-intensive and highly-cost task. In this paper, we implement Word2Vec as an alternative solution of hand-crafted features for sentiment analysis of hotel reviews in the Indonesian language. To obtain the highest performance of sentiment analysis, we evaluate three parameters of Word2Vec include Word2Vec model architecture, evaluation method, and vector dimension. This evaluation process was implemented towards our proposed corpus for a specific domain, i.e. hotel reviews, consists of 2500 hotel reviews in the Indonesian language (1250 positive reviews and 1250 negative reviews). The result shows that the highest accuracy values are obtained under the combination of the following parameters, namely the architecture of Word2Vec Model is Skip-gram model, the evaluation method is Hierarchical Softmax, as well as the vector dimension is 100. The Skip-gram model results highest accuracy for words that rarely appear, such as in sentiment analysis task, whereas the Hierarchical Softmax provides better results since during the training process using a binary tree model to represent all of the words in the vocabulary and leaf nodes representing rare words so that rarely appearing words will inherit vector representations in it. Furthermore, to obtain the optimal value of accuracy, then we should increase the vector dimensions and amount of data simultaneously.
机译:手工制作要素工程是一项劳动密集型且成本高的任务。在本文中,我们将Word2Vec用作手工制作功能的替代解决方案,以印度尼西亚语对酒店点评进行情感分析。为了获得最高的情感分析性能,我们评估Word2Vec的三个参数:Word2Vec模型体系结构,评估方法和向量维。此评估过程是针对我们提议的特定领域语料库(即酒店评论)实施的,其中包括2500种印尼语酒店评论(1250条正面评论和1250条负面评论)。结果表明,在以下参数的组合下,可以得到最高的精度值:Word2Vec模型的体系结构为Skip-gram模型,评估方法为Hierarchical Softmax,向量维数为100。该模型针对很少出现的单词(例如,在情感分析任务中)产生最高的准确性,而分层Softmax提供更好的结果,因为在训练过程中使用二叉树模型表示词汇中的所有单词,而叶子节点表示稀有单词,因此很少出现的单词会继承其中的向量表示。此外,为了获得最佳的精度值,我们应该同时增加向量维数和数据量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号