首页> 外文会议>IEEE International Conference on Data Mining >Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions
【24h】

Mining the Demographics of Political Sentiment from Twitter Using Learning from Label Proportions

机译:使用标签比例学习来挖掘Twitter的政治情感人口统计信息

获取原文

摘要

Opinion mining and demographic attribute inference have many applications in social science. In this paper, we propose models to infer daily joint probabilities of multiple latent attributes from Twitter data, such as political sentiment and demographic attributes. Since it is costly and time-consuming to annotate data for traditional supervised classification, we instead propose scalable Learning from Label Proportions (LLP) models for demographic and opinion inference using U.S. Census, national and state political polls, and Cook partisan voting index as population level data. In LLP classification settings, the training data is divided into a set of unlabeled bags, where only the label distribution of each bag is known, removing the requirement of instance-level annotations. Our proposed LLP model, Weighted Label Regularization (WLR), provides a scalable generalization of prior work on label regularization to support weights for samples inside bags, which is applicable in this setting where bags are arranged hierarchically (e.g., county-level bags are nested inside of state-level bags). We apply our model to Twitter data collected in the year leading up to the 2016 U.S. presidential election, producing estimates of the relationships among political sentiment and demographics over time and place. We find that our approach closely tracks traditional polling data stratified by demographic category, resulting in error reductions of 28-44% over baseline approaches. We also provide descriptive evaluations showing how the model may be used to estimate interactions among many variables and to identify linguistic temporal variation, capabilities which are typically not feasible using traditional polling methods.
机译:观点挖掘和人口属性推论在社会科学中有许多应用。在本文中,我们提出了一些模型来从Twitter数据推断多个潜在属性的日常联合概率,例如政治情绪和人口统计属性。由于注释传统监督分类的数据既昂贵又费时,因此,我们建议使用美国人口普查,国家和州政治民意调查以及库克游击党投票指数作为人口来进行人口统计和意见推断的可扩展的“从标签比例学习”(LLP)模型进行学习。级别数据。在LLP分类设置中,训练数据被分为一组未标记的袋子,其中仅每个袋子的标签分布已知,从而消除了实例级注释的要求。我们提出的LLP模型(加权标签正则化(WLR))可对标签正则化的先前工作进行可扩展的概括,以支持袋子内样品的权重,这适用于这种情况,其中袋子是分层放置的(例如,县级袋子被嵌套)在国家级邮袋内)。我们将模型应用于2016年美国总统大选前一年收集的Twitter数据,从而估算出政治情绪与人口统计数据之间随时间和地点的关系。我们发现,我们的方法密切跟踪按人口统计类别分层的传统民意测验数据,与基准方法相比,可将错误减少28-44%。我们还提供了描述性评估,显示了如何使用该模型来估计许多变量之间的交互作用以及识别语言的时间变化,这些能力通常无法使用传统的轮询方法来实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号