首页> 外文期刊>Computer speech and language >Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach
【24h】

Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach

机译:提高形态丰富语言的情感分析性能:语言和领域无关的方法

获取原文
获取原文并翻译 | 示例

摘要

Sentiment analysis has become a phenomenon with the proliferation of social media and the popularity of opinion-rich resources such as online reviews and blogs. Even though significant advances have been achieved in this field, there are still some major challenges to be addressed - i.e. sentiment analysis in multiple languages or thematic domains. Only a few studies have focused on minor or morphologically rich languages. Moreover, it is a question of whether the results of sentiment analysis could be further improved by incorporating the surrounding context (local or chronological) of the analyzed document. This paper presents a language- and domain-independent sentiment analysis model based on character n-grams which improves the classifiers performance by utilizing surrounding context.Four experiments on various datasets were conducted in order to validate the model. The datasets included a reference corpus containing movie reviews in English, movie reviews in the Czech language, the bestselling Amazon book of 2012 Fifty Shades of Grey novel reviews dataset from three Amazon language mutations (English, German, and French), another reference corpus containing Amazon reviews in multiple languages (German, French, and Japanese), and a multi-domain dataset (movies, books, and product categories ranging from electronics and home appliances to sports gear and supplies for hobbies and pets).The experiments confirmed the approach of incorporating the surrounding context in order to be effective for datasets from various languages and domains, suggesting a strong performance of a character n-gram based model for multi-domain and language datasets as well. A simple all-in-one classifier, which uses a mixture of labeled data from multiple languages (or domains) to train a sentiment classification model, may rival more sophisticated domain/language adaptation techniques. Such an approach reflects the needs of companies - with the interconnectedness of today's world, most companies operate across multiple markets and would find it difficult to obtain a specific sentiment analysis solution for each market they serve. (C) 2019 Elsevier Ltd. All rights reserved.
机译:随着社交媒体的普及以及在线评论和博客等观点丰富的资源的普及,情感分析已成为一种现象。即使在该领域取得了重大进展,但仍然要解决一些主要挑战-即使用多种语言或主题领域进行情感分析。只有少数研究集中在次要或形态丰富的语言上。此外,这是一个问题,是否可以通过合并分析文档的周围环境(本地或按时间顺序)来进一步改善情感分析的结果。本文提出了一种基于字符n元语法的语言和领域无关的情感分析模型,该模型通过利用周围环境来提高分类器的性能。在各种数据集上进行了四个实验以验证该模型。数据集包括一个参考语料库,该参考语料库包含英语电影评论,捷克语电影评论,2012年最畅销的亚马逊书籍《五十道灰色阴影》小说评论数据集,来自三种亚马逊语言突变(英语,德语和法语),另外一个参考语料库亚马逊以多种语言(德语,法语和日语)进行了评论,并提供了一个多领域数据集(电影,书籍和产品类别,从电子产品和家用电器到运动装备和嗜好和宠物用品),实验证实了这种方法。为了有效地处理来自各种语言和领域的数据集而引入的周围环境,这表明基于字符n-gram的模型对于多域和语言数据集的强大性能。一个简单的多合一分类器,它使用来自多种语言(或域)的标记数据的混合来训练情感分类模型,可以与更复杂的域/语言适应技术相媲美。这种方法反映了公司的需求-在当今世界相互联系的情况下,大多数公司跨多个市场开展业务,并且会发现很难为其所服务的每个市场都获得特定的情绪分析解决方案。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号