首页> 外文会议>International Conference on Language Resources and Evaluation >Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language
【24h】

Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language

机译:使用机器翻译对Urdu语言的假新闻检测的数据增强

获取原文

摘要

The task of fake news detection is to distinguish legitimate news articles that describe real facts from those which convey deceiving and fictitious information. As the fake news phenomenon is omnipresent across all languages, it is crucial to be able to efficiently solve this problem for languages other than English. A common approach to this task is supervised classification using features of various complexity. Yet supervised machine learning requires substantial amount of annotated data. For English and a small number of other languages, annotated data availability is much higher, whereas for the vast majority of languages, it is almost scarce. We investigate whether machine translation at its present state could be successfully used as an automated technique for annotated corpora creation and augmentation for fake news detection focusing on the English-Urdu language pair. We train a fake news classifier for Urdu on (1) the manually annotated dataset originally in Urdu and (2) the machine-translated version of an existing annotated fake news dataset originally in English. We show that at the present state of machine translation quality for the English-Urdu language pair, the fully automated data augmentation through machine translation did not provide improvement for fake news detection in Urdu.
机译:假新闻检测的任务是区分合法的新闻文章,这些文章描述了那些传达欺骗和虚构信息的人的真实事实。由于假新闻现象是全面的所有语言,能够有效地解决英语以外的语言是至关重要的。该任务的常见方法是使用各种复杂性的特征进行监督分类。然而,监督机器学习需要大量的注释数据。对于英语和少量其他语言,注释数据可用性要高得多,而对于绝大多数语言,它几乎稀缺。我们调查了当前状态的机器翻译是否可以成功用作注释的语料库的自动化技术,并为假新闻检测专注于英语 - 乌尔都语对。我们为Urdu的手动注释的数据集培训了Urdu的假新闻分类器,(2)现有注释的假新闻数据集的机器翻译版本以英语为单位。我们认为,在当前的英国 - 乌尔德语言对的机器翻译质量状态下,通过机器翻译完全自动化的数据增强并未为URDU进行假新闻检测提供改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号