首页> 外文会议>IEEE International Conference on Information Reuse and Integration >Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers

【24h】

Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers

机译：通过比较Logistic回归和天真贝叶斯分类器的推文作者的分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

At a time when all it takes to open a Twitter account is a mobile phone, the act of authenticating information encountered on social media becomes very complex, especially when we lack measures to verify digital identities in the first place. Because the platform supports anonymity, fake news generated by dubious sources have been observed to travel much faster and farther than real news. Hence, we need valid measures to identify authors of misinformation to avert these consequences. Researchers propose different authorship attribution techniques to approach this kind of problem. However, because tweets are made up of only 280 characters, finding a suitable authorship attribution technique is a challenge. This research aims to classify authors of tweets by comparing machine learning methods like logistic regression and naive Bayes. The processes of this application are fetching of tweets, pre-processing, feature extraction, and developing a machine learning model for classification. This paper illustrates the text classification for authorship process using machine learning techniques. In total, there were 46,895 tweets used as both training and testing data, and unique features specific to Twitter were extracted. Several steps were done in the pre-processing phase, including removal of short texts, removal of stop-words and punctuations, tokenizing and stemming of texts as well. This approach transforms the pre-processed data into a set of feature vector in Python. Logistic regression and naive Bayes algorithms were applied to the set of feature vectors for the training and testing of the classifier. The logistic regression based classifier gave the highest accuracy of 91.1% compared to the naive Bayes classifier with 89.8%.

机译：在打开Twitter帐户的所有时间是一个手机时，社交媒体遇到的信息的行为变得非常复杂，特别是当我们缺乏措施验证数字身份的第一名时。因为该平台支持匿名，所以已经观察到由可疑来源产生的假新闻比真正的新闻更快更远。因此，我们需要有效措施来识别错误信息的作者以避免这些后果。研究人员提出了不同的作者归因技术来解决这种问题。但是，因为推文仅由280个字符组成，所以找到合适的作者归因技术是一项挑战。本研究旨在通过比较Logistic回归和天真贝叶斯等机器学习方法来分类推文的作者。本申请的过程正在提取推文，预处理，特征提取，以及开发用于分类的机器学习模型。本文说明了使用机器学习技术的作者流程的文本分类。总共有46,895次推文用作培训和测试数据，提取特定于Twitter的独特功能。在预处理阶段完成了几个步骤，包括删除短文本，删除止血和标点，以及文本的销称和串行。该方法将预处理数据转换为Python中的一组特征向量。逻辑回归和天真贝叶斯算法应用于该组特征向量，用于分类器的培训和测试。基于逻辑回归的分类器与具有89.8 ％的Naive Bayes分类器相比，最高精度为91.1 ％。

著录项

来源
《IEEE International Conference on Information Reuse and Integration 》|2018年|xix 545 p. :|共8页
会议地点
作者
Opeyemi Aborisade; Mohd Anwar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动信息理论 ;
关键词
Twitter; Feature extraction; Logistics; Machine learning; Training; Electronic mail;

机译：Twitter;特征提取;物流;机器学习;培训;电子邮件;

相似文献

外文文献
中文文献
专利

1. Text Classification for Authorship Attribution Using Naive Bayes Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd Computer Engineering and Intelligent Systems . 2014 ,第4期

机译：使用朴素贝叶斯分类器和有限的训练数据对作者归属进行文本分类
2. Text Classification for Authorship Attribution Using Naive Bayes Classifier with Limited Training Data [J] . Fatma Howedi, Masnizah Mohd Journal of Economics and Sustainable Development . 2014 ,第4期

机译：使用朴素贝叶斯分类器和有限的训练数据对作者归属进行文本分类
3. Comparison of a logistic regression and Naive Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size [J] . Tsangaratos Paraskevas, Ilia Ioanna Catena: An Interdisciplinary Journal of Soil Science Hydrology-Geomorphology Focusing on Geoecology and Landscape Evolution . 2016 ,第Null期

机译：Logistic回归和朴素贝叶斯分类器在滑坡敏感性评估中的比较：模型复杂性和训练数据集大小的影响
4. Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers [C] . Opeyemi Aborisade, Mohd Anwar 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science . 2018

机译：通过比较Logistic回归和朴素贝叶斯分类器对推文的作者进行分类
5. Application of a Hidden Bayes Naive Multiclass Classifier in Network Intrusion Detection [D] . Koc, Levent. 2013

机译：隐藏式贝叶斯朴素多类分类器在网络入侵检测中的应用
6. Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21000 child and adult deaths [O] . Pierre Miasnikof, Vasily Giannakeas, Mireille Gomes, 2015

机译：朴素贝叶斯言语尸检分类器：与基于医师的21000名儿童和成人死亡分类比较
7. Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification [O] . Tomas Pranckevičius, Virginijus Marcinkevičius 2017

机译：Naive Bayes，随机森林，决策树，支持向量机和文本评论分类的逻辑回归分类器的比较

Classification for Authorship of Tweets by Comparing Logistic Regression and Naive Bayes Classifiers

摘要

著录项

相似文献

相关主题

期刊订阅