首页> 外文学位 >Predicting latent demographic attributes of Twitter users.
【24h】

Predicting latent demographic attributes of Twitter users.

机译:预测Twitter用户的潜在人口统计属性。

获取原文
获取原文并翻译 | 示例

摘要

Social media websites such as Twitter, Facebook, and LinkedIn aggregate large amounts of textual data. There is a wealth of user information that can be inferred from this, that is potentially useful in advertising, analytics, sentiment analysis, etc. It is estimated that over 60% of people in the US have a Twitter account, and a significant portion of US population is comprised of immigrants. As social media have become common place, people are willingly posting their personal information such as their name, age, location, alma mater, etc.;This makes it possible to use text classification methods to accurately determine demographic profiles. This thesis focuses on extracting latent demographic information from social media data. Previous works have attempted to determine user's race and ethnicity, while our work focuses on using posts on Twitter (tweets), to determine whether a user is an immigrant or a native US citizen. The method uses ethnic name distribution among immigrant and native populations to find and collect users in the United States, and their tweets across three race groups: Asian, Latino, and Caucasian/White. We use supervised machine learning approach to predict the immigration status of a user by examining the textual content of tweets, using Multinomial Naive Bayes, Support Vector Machines, Logistic Regression, k-Nearest Neighbors, and Decision Trees. We investigate methods for improving the performance of algorithms and determine how number of features affects the accuracy of the built models. Additionally we evaluate which features have more weight in classifying users, and attempt to discover latent topical patterns in the data corpus using Latent Dirichlet Allocation.
机译:诸如Twitter,Facebook和LinkedIn的社交媒体网站聚集了大量的文本数据。从中可以推断出大量的用户信息,这些信息可能对广告,分析,情感分析等有用。据估计,美国有60%以上的人拥有Twitter帐户,其中很大一部分美国人口由移民组成。随着社交媒体的普及,人们愿意发布自己的个人信息,例如姓名,年龄,位置,母校等;这使得使用文本分类方法来准确确定人口统计资料成为可能。本文的重点是从社交媒体数据中提取潜在的人口统计信息。先前的工作试图确定用户的种族和种族,而我们的工作重点是使用Twitter上的帖子(推文)来确定用户是移民还是美国原住民。该方法使用移民和本地人口中的种族名称分布来查找和收集美国的用户,以及他们在三个种族组中的推文:亚洲人,拉丁美洲人和高加索人/白人。我们使用有监督的机器学习方法,通过使用多项朴素贝叶斯,支持向量机,逻辑回归,k最近邻和决策树来检查推文的文本内容,从而预测用户的移民状态。我们研究改善算法性能的方法,并确定特征数量如何影响所构建模型的准确性。此外,我们评估哪些功能在对用户进行分类时具有更大的权重,并尝试使用潜在Dirichlet分配在数据语料库中发现潜在的主题模式。

著录项

  • 作者

    Frolov, Georgiy.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Computer science.;Web studies.
  • 学位 M.S.
  • 年度 2016
  • 页码 94 p.
  • 总页数 94
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号