【24h】

SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles

机译:Saroco:在新的罗马尼亚新闻文章中探测讽刺

获取原文

摘要

In this work, we introduce a corpus for satire detection in Romanian news. We gathered 55.608 public news articles from multiple real and satirical news sources, composing one of the largest corpora for satire detection regardless of language and the only one for the Romanian language. We provide an official split of the text samples, such that training news articles belong to different sources than test news articles, thus ensuring that models do not achieve high performance simply due to overfitting. We conduct experiments with two state-of-the-art deep neural models, resulting in a set of strong baselines for our novel corpus. Our results show that the machine-level accuracy for satire detection in Romanian is quite low (under 73% on the test set) compared to the human-level accuracy (87%), leaving enough room for improvement in future research.
机译:在这项工作中,我们介绍了罗马尼亚新闻中的讽刺检测语料库。 我们收集了多个真实和讽刺新闻来源的55.608篇公共新闻文章,而是撰写最大的Satire检测中最大的Corpora之一,无论语言和罗马尼亚语的唯一一个。 我们提供了文本样本的官方分割,使得培训新闻文章属于不同来源而不是测试新闻文章,从而确保模型不会因为过度装备而无法实现高性能。 我们用两种最先进的深神经模型进行实验,导致我们的新型语料库的一套强大的基线。 我们的研究结果表明,与人类水平精度(87%)相比,罗马尼亚讽刺检测的机器级精度非常低(测试集中的73%),留下了足够的房间,以改善未来的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号