...
首页> 外文期刊>Computer speech and language >Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach
【24h】

Improving sentiment analysis performance on morphologically rich languages: Language and domain independent approach

机译:改善形态学丰富语言的情绪分析性能:语言和域独立方法

获取原文
获取原文并翻译 | 示例
           

摘要

Sentiment analysis has become a phenomenon with the proliferation of social media and the popularity of opinion-rich resources such as online reviews and blogs. Even though significant advances have been achieved in this field, there are still some major challenges to be addressed - i.e. sentiment analysis in multiple languages or thematic domains. Only a few studies have focused on minor or morphologically rich languages. Moreover, it is a question of whether the results of sentiment analysis could be further improved by incorporating the surrounding context (local or chronological) of the analyzed document. This paper presents a language- and domain-independent sentiment analysis model based on character n-grams which improves the classifiers performance by utilizing surrounding context.Four experiments on various datasets were conducted in order to validate the model. The datasets included a reference corpus containing movie reviews in English, movie reviews in the Czech language, the bestselling Amazon book of 2012 Fifty Shades of Grey novel reviews dataset from three Amazon language mutations (English, German, and French), another reference corpus containing Amazon reviews in multiple languages (German, French, and Japanese), and a multi-domain dataset (movies, books, and product categories ranging from electronics and home appliances to sports gear and supplies for hobbies and pets).The experiments confirmed the approach of incorporating the surrounding context in order to be effective for datasets from various languages and domains, suggesting a strong performance of a character n-gram based model for multi-domain and language datasets as well. A simple all-in-one classifier, which uses a mixture of labeled data from multiple languages (or domains) to train a sentiment classification model, may rival more sophisticated domain/language adaptation techniques. Such an approach reflects the needs of companies - with the interconnectedness of today's world, most companies operate across multiple markets and would find it difficult to obtain a specific sentiment analysis solution for each market they serve. (C) 2019 Elsevier Ltd. All rights reserved.
机译:情绪分析已成为社交媒体扩散的现象以及在线评审和博客等富有的富有资源的普及。尽管在这一领域取得了重大进展,但仍有一些主要挑战 - 即多种语言或专题领域的情绪分析。只有一些研究专注于次要或形态丰富的语言。此外,通过纳入分析的文件的周围的上下文(局部或时间),可以进一步改善情绪分析结果。本文介绍了一种基于字符N-GR克的语言和域 - 独立情绪分析模型,其通过利用周围的上下文来提高分类器性能。进行各种数据集的实验以验证模型。该数据集包括捷克语中的电影评论的参考语料库,2012年Bestselling Amazon Book of Gray小说评论来自三个亚马逊语言突变的数据集(英语,德语和法语),另一种参考语料库亚马逊评论多种语言(德语,法国语和日语)和多域数据集(电影,书籍和产品类别,从电子产品和家用电器到运动齿轮以及爱好和宠物用品)。实验证实了这种方法结合周围的上下文,以便对来自各种语言和域的数据集有效,这表明对多域和语言数据集的角色n-gram模型的强烈性能。一个简单的一体化分类器,它使用来自多种语言(或域名)的标记数据的混合来训练情绪分类模型,可以媲美更复杂的域/语言适应技术。这样的方法反映了公司的需求 - 随着当今世界的相互连接,大多数公司在多个市场中运作,并难以获得他们所服务的每个市场的特定情感分析解决方案。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号