首页> 外文会议>Fifth international workshop on natural language processing for social media >A Twitter Corpus and Benchmark Resources for German Sentiment Analysis
【24h】

A Twitter Corpus and Benchmark Resources for German Sentiment Analysis

机译:Twitter语料库和用于德国情绪分析的基准资源

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present SB10k, a new corpus for sentiment analysis with approx. 10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art benchmarks for sentiment analysis in German: we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available.
机译:在本文中,我们介绍了SB10k,这是一种用于情感分析的新语料库,大约有。 10,000条德国推文。我们使用这个新的语料库和两个现有的语料库来提供德语情感分析的最新基准:我们实施了CNN(基于SemEval-2016的获奖系统)和基于功能的SVM,并比较了它们的性能在所有三个语料库上。对于CNN,我们还创建了在3亿条推文上训练的德语单词嵌入。然后,使用远程监督学习对这些词嵌入进行优化以进行情感分析。新的语料库,德语单词嵌入(纯文本和优化的)以及重新运行基准测试的源代码是公开可用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号