首页> 外文会议>4th Biennial international workshop on Balto-Slavic natural language processing 2013 >A Comparison of Approaches for Sentiment Classification on Lithuanian Internet Comments
【24h】

A Comparison of Approaches for Sentiment Classification on Lithuanian Internet Comments

机译:立陶宛语网络评论情感分类方法的比较

获取原文
获取原文并翻译 | 示例

摘要

Despite many methods that effectively solve sentiment classification task for such widely used languages as English, there is no clear answer which methods are the most suitable for the languages that are substantially different. In this paper we attempt to solve Internet comments sentiment classification task for Lithuanian, using two classification approaches -knowledge-based and supervised machine learning. We explore an influence of sentiment word dictionaries based on the different parts-of-speech (adjectives, adverbs, nouns, and verbs) for knowledge-based method; different feature types (bag-of-words, lemmas, word n-grams, character n-grams) for machine learning methods; and pre-processing techniques (emoticons replacement with sentiment words, diacritics replacement, etc.) for both approaches. Despite that supervised machine learning methods (Support Vector Machine and Naieve Bayes Multinomial) significantly outperform proposed knowledge-based method all obtained results are above baseline. The best accuracy 0.679 was achieved with Naieve Bayes Multinomial and token unigrams plus bi-grams, when pre-processing involved diacritics replacement.
机译:尽管有许多方法可以有效地解决英语等广泛使用的语言的情感分类任务,但仍没有明确答案,哪种方法最适合本质上不同的语言。在本文中,我们尝试使用两种分类方法-基于知识的方法和有监督的机器学习来解决立陶宛语的Internet评论情感分类任务。我们探讨了基于不同词类(形容词,副词,名词和动词)的情感词词典对基于知识的方法的影响;机器学习方法的不同特征类型(单词袋,引理,单词n-gram,字符n-gram);以及两种方法的预处理技术(用情感词替换表情符号,变音符号替换等)。尽管有监督的机器学习方法(支持向量机和Naieve Bayes多项式)明显优于建议的基于知识的方法,所有获得的结果均高于基线。当预处理涉及变音符号替换时,使用Naieve Bayes多项式和令牌unigram加上bi-gram可以达到0.679的最佳精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号