首页> 外文期刊>The Journal of Artificial Intelligence Research >Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data
【24h】

Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data

机译:根据网站流量数据使用远程监督预测Twitter用户人口统计

获取原文
获取原文并翻译 | 示例
           

摘要

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics from information about the followers of each website on Twitter. Using patterns derived both from textual content and the social network of each user, our final model produces an average held-out correlation of .77 across seven different variables (age, gender, education, ethnicity, income, parental status, and political preference). We then apply this model to classify individual Twitter users by ethnicity, gender, and political preference, finding performance that is surprisingly competitive with a fully supervised approach.
机译:了解在线社交网络用户的人口统计信息对于健康,市场营销和公共消息传递具有重要的应用。尽管大多数先前的方法依赖于监督学习方法,在该方法中,个人用户被人口统计学标记用于培训,但我们改为通过收集1,500个网站的受众测量数据来创建远距标记的数据集(例如,估计50%的gizmodo.com访问者获得学士学位)。然后,我们使用回归模型,根据有关Twitter上每个网站的关注者的信息来预测这些人口统计信息。使用从文本内容和每个用户的社交网络获得的模式,我们的最终模型在七个不同变量(年龄,性别,教育程度,种族,收入,父母身份和政治偏好)中产生0.77的平均保持相关性。 。然后,我们使用此模型按种族,性别和政治偏好对各个Twitter用户进行分类,从而发现在完全监督的方法下具有令人惊讶的竞争力的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号