Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language

机译：使用机器翻译对Urdu语言的假新闻检测的数据增强

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of fake news detection is to distinguish legitimate news articles that describe real facts from those which convey deceiving and fictitious information. As the fake news phenomenon is omnipresent across all languages, it is crucial to be able to efficiently solve this problem for languages other than English. A common approach to this task is supervised classification using features of various complexity. Yet supervised machine learning requires substantial amount of annotated data. For English and a small number of other languages, annotated data availability is much higher, whereas for the vast majority of languages, it is almost scarce. We investigate whether machine translation at its present state could be successfully used as an automated technique for annotated corpora creation and augmentation for fake news detection focusing on the English-Urdu language pair. We train a fake news classifier for Urdu on (1) the manually annotated dataset originally in Urdu and (2) the machine-translated version of an existing annotated fake news dataset originally in English. We show that at the present state of machine translation quality for the English-Urdu language pair, the fully automated data augmentation through machine translation did not provide improvement for fake news detection in Urdu.

机译：假新闻检测的任务是区分合法的新闻文章，这些文章描述了那些传达欺骗和虚构信息的人的真实事实。由于假新闻现象是全面的所有语言，能够有效地解决英语以外的语言是至关重要的。该任务的常见方法是使用各种复杂性的特征进行监督分类。然而，监督机器学习需要大量的注释数据。对于英语和少量其他语言，注释数据可用性要高得多，而对于绝大多数语言，它几乎稀缺。我们调查了当前状态的机器翻译是否可以成功用作注释的语料库的自动化技术，并为假新闻检测专注于英语 - 乌尔都语对。我们为Urdu的手动注释的数据集培训了Urdu的假新闻分类器，（2）现有注释的假新闻数据集的机器翻译版本以英语为单位。我们认为，在当前的英国 - 乌尔德语言对的机器翻译质量状态下，通过机器翻译完全自动化的数据增强并未为URDU进行假新闻检测提供改进。

著录项

来源
《International Conference on Language Resources and Evaluation》|2020年|2537-2542|共6页
会议地点
作者
Maaz Amjad; Grigori Sidorov; Alisa Zhila;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
fake news detection; Urdu language; language resources; data augmentation; benchmark dataset; classification;

机译：假新闻检测;乌尔都语;语言资源;数据增强;基准数据集;分类;

相似文献

外文文献
中文文献
专利

1. "Bend the truth": Benchmark dataset for fake news detection in Urdu language and its evaluation [J] . Amjad Maaz, Sidorov Grigori, Zhila Alisa, Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：“弯曲真相”：乌尔都语语言中假新闻检测的基准数据集及其评估
2. Fake news detection in multiple platforms and languages [J] . Arruda Faustini Pedro Henrique, Covoes Thiago Ferreira Expert systems with applications . 2020,第Nova期

机译：虚假的新闻检测多个平台和语言
3. Dataset of Pakistan Sign Language and Automatic Recognition of Hand Configuration of Urdu Alphabet through Machine Learning [J] . Ali Imran, Abdul Razzaq, Irfan Ahmad Baig, Data in Brief . 2021,第a期

机译：通过机器学习，巴基斯坦的数据集和自动识别Urdu字母的手部配置
4. A Machine Learning Approach to Fake News Detection Using Knowledge Verification and Natural Language Processing [C] . Marina Danchovsky Ibrishimova, Kin Fun Li International Conference on Intelligent Networking and Collaborative Systems . 2020

机译：使用知识验证和自然语言处理的假新闻检测机器学习方法
5. Machine Learning and Semantic Knowledge Assisted Fake News Detection Models [D] . Sabeeh, Vian Talal. 2020

机译：机器学习和语义知识辅助假新闻检测模型
6. Dataset of Pakistan Sign Language and Automatic Recognition of Hand Configuration of Urdu Alphabet through Machine Learning [O] . Ali Imran, Abdul Razzaq, Irfan Ahmad Baig, 2021

机译：通过机器学习巴基斯坦的数据集和自动识别URDU字母的手机配置
7. Combating Fake News in “Low-Resource” Languages: Amharic Fake News Detection Accompanied by Resource Crafting [O] . Fantahun Gereme, William Zhu, Tewodros Ayall, 2021

机译：打击“低资源”语言的假新闻：Amharic假新闻检测伴随着资源制作

Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language

摘要

著录项

相似文献

相关主题

期刊订阅