首页> 外文会议>Irish Signals and Systems Conference >Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine
【24h】

Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression, MultinominalNB and Support Vector Machine

机译:利用CountVectorizer和术语频率反转文档频率与Logistic回归,MultimominNB和支持向量机的假新闻检测

获取原文

摘要

The distribution of misleading information or fake news has become a problem for society in recent times. In the world of social media, where anyone can share their opinions, beliefs and make it sound like these are fact, fake news becomes a threat to the reputation of companies and to people. In 2016, the USA Presidential elections gathered more attention from the generation of fake news articles, leading to a huge number of researchers and scientists to explore this Natural Language Processing research area with a sense of urgency and keen interest. However, investigation regarding what people are consuming from social media is in early stages and efforts are in progress to explore how people can separate disinformation from truthful content. The primary challenge in fake news detection is determining how to detect it. Supervised learning methods help us to detect these stories using labelled data to determine if text is real or fake. This research aims to develop and compare supervised learning models using Logistic Regression, MultinominalNB, and Support Vector Machine with CountVectorizer and Term Frequency -Inverse Document Frequency methods on Reddit data. The research concludes that the CountVectorizer and MultinominalNB model achieved highest accuracy on the Reddit dataset.
机译:误导信息或假新闻的分布已成为近时社会的问题。在社交媒体的世界中,任何人都可以分享他们的意见,信仰和让它听起来像这些事实,假新闻变成了公司和人民声誉的威胁。 2016年,美国总统选举从一代伪新闻文章中收集了更多的关注,导致大量的研究人员和科学家探讨了这种自然语言处理研究领域,具有紧迫感和敏锐的兴趣。但是,关于人们从社交媒体消费的调查是在早期阶段,努力正在探索人们如何将不属性与真实内容分开。假新闻检测中的主要挑战是确定如何检测它。监督学习方法有助于我们使用标记数据检测这些故事,以确定文本是否是真实的或假的。本研究旨在使用Logistic回归,MultinInalNB和支持向量机进行开发和比较具有CountVectorizer和术语频率 - 频率 - 在RedDIT数据上的频率 - 频率方法的监督学习模型。该研究的结论是,CountVectorizer和MultiMinalNB模型在Reddit DataSet上实现了最高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号