【24h】

A Tool for Fake News Detection

机译:假新闻检测工具

获取原文

摘要

The word post-truth was considered by Oxford Dictionaries Word of the Year 2016. The word is an adjective relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief. This leads to misinformation and problems in society. Hence, it is important to make effort to detect these facts and prevent them from spreading. In this paper we propose machine learning techniques, in particular supervised learning, for fake news detection. More precisely, we used a dataset of fake and real news to train a machine learning model using Scikit-learn library in Python. We extracted features from the dataset using text representation models like Bag-of-Words, Term Frequency-Inverse Document Frequency (TF-IDF) and Bi-gram frequency. We tested two classification approaches, namely probabilistic classification and linear classification on the title and the content, checking if it is clickbaitonclickbait, respectively fake/real. The outcome of our experiments was that the linear classification works the best with the TF-IDF model in the process of content classification. The Bi-gram frequency model gave the lowest accuracy for title classification in comparison with Bag-of-Words and TF-IDF.
机译:牛津词典授予“ 2016年度最佳词汇”一词。该词是一个形容词,与情况有关,或表示客观事实对公众舆论的影响小于对情感和个人信仰的吸引力。这导致了错误的信息和社会问题。因此,重要的是要努力发现这些事实并防止它们传播。在本文中,我们提出了用于伪造新闻检测的机器学习技术,特别是监督学习。更准确地说,我们使用了虚假和真实新闻的数据集,使用Python中的Scikit-learn库训练了机器学习模型。我们使用文本表示模型(例如单词袋,术语频率-逆文档频率(TF-IDF)和Bi-gram频率)从数据集中提取了特征。我们测试了两种分类方法,分别是标题和内容的概率分类和线性分类,分别检查是点击诱饵还是非点击诱饵,分别是假冒/真实。我们的实验结果是,在内容分类过程中,线性分类最适合TF-IDF模型。与单词袋和TF-IDF相比,Bi-gram频率模型为标题分类提供了最低的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号