Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features

Zachary Miller; Brian Dickinson; Wei Hu

首页> 外文期刊>International Journal of Intelligence Science >Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features

【24h】

Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features

机译：使用具有N-Gram字符特征的流算法在Twitter上进行性别预测

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.

机译：社交网络的快速发展产生了前所未有的用户生成的数据量，这为文本挖掘提供了绝佳的机会。作者身份分析是文本挖掘的重要组成部分，它试图通过性别，年龄和社会群体之间的细微变化来了解文本的作者。此类信息具有多种应用程序，包括广告和执法。 Twitter是用户生成数据的最可访问的来源之一，它可以通过其数据访问API免费提供其大部分用户数据。在这项研究中，我们试图使用Perceptron和Nai ve Bayes在推文中选择1到5克的特征来识别Twitter上用户的性别。这些算法的流应用程序用于性别预测，以处理推文流量的速度和数量。由于非正式文本（例如推文）无法使用传统的词典方法轻松评估，因此本研究中采用了n-gram功能来表示流式推文。 1到5克的大量字母要求仅将其中的一部分用于性别分类，因此，使用多种选择算法选择了具有信息意义的n字母特征。在最佳情况下，朴素贝叶斯（Naive Bayes）和感知器（Perceptron）算法产生了99％以上的精度，平衡精度和F值。

著录项

来源
《International Journal of Intelligence Science》 |2012年第4期|共6页
作者
Zachary Miller; Brian Dickinson; Wei Hu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features [J] . Barkha Bansal, Sangeet Srivastava International Journal of Web Based Communities . 2019,第1期

机译：基于词典的Twitter情感分析，使用表情符号和N-gram功能预测投票份额
2. Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features [J] . Barkha Bansal, Sangeet Srivastava International Journal of Web Based Communities . 2019,第1期

机译：基于Lexicon的推特情绪分析，用于使用Emoji和N-Gram特征的投票份额预测
3. Algorithmically generated malicious domain names detection based on n-grams features [J] . Cucchiarelli Alessandro, Morbidoni Christian, Spalazzi Luca, Expert systems with applications . 2021,第May期

机译：基于N-GRAMS功能的算法生成的恶意域名检测
4. Comparison of Character n-grams and Lexical Features on Author, Gender, and Language Variety Identification on the Same Spanish News Corpus [C] . Miguel A. Sanchez-Perez, Ilia Markov, Helena Gomez-Adorno, International conference of the CLEF Association . 2017

机译：同一西班牙新闻语料库上的字符n-gram和作者，性别和语言多样性识别的词汇特征的比较
5. An N-gram enhanced learning classifier for Chinese character recognition. [D] . Ayer, Eliot William. 2013

机译：用于汉字识别的N-gram增强型学习分类器。
6. How the world’s collective attention is being paid to a pandemic: COVID-19 related n-gram time series for 24 languages on Twitter [O] . Thayer Alshaabi, Michael V. Arnold, Joshua R. Minot, 2021

机译：如何向大流行病人的集体注意力：Covid-19相关的N-Gram时间序列在Twitter上进行24种语言
7. Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features [O] . Zachary Miller, Brian Dickinson, Wei Hu 2012

机译：使用具有N-Gram字符特征的流算法在Twitter上进行性别预测

Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅