【24h】

Inferring latent attributes of Twitter users with label regularization

机译:使用标签正则化推断Twitter用户的潜在属性

获取原文

摘要

Inferring latent attributes of online users has many applications in public health, politics, and marketing. Most existing approaches rely on supervised learning algorithms, which require manual data annotation and therefore are costly to develop and adapt over time. In this paper, we propose a lightly supervised approach based on label regularization to infer the age, ethnicity, and political orientation of Twitter users. Our approach learns from a heterogeneous collection of soft constraints derived from Census demographics, trends in baby names, and Twitter accounts that are emblematic of class labels. To counteract the imprecision of such constraints, we compare several constraint selection algorithms that optimize classification accuracy on a tuning set. We find that using no user-annotated data, our approach is within 2% of a fully supervised baseline for three of four tasks. Using a small set of labeled data for tuning further improves accuracy on all tasks.
机译:推断在线用户的潜在属性在公共卫生,政治和市场营销中有许多应用。现有的大多数方法都依赖于监督学习算法,该算法需要人工注释数据,因此随着时间的推移开发和适应成本很高。在本文中,我们提出了一种基于标签正则化的轻度监督方法,以推断Twitter用户的年龄,种族和政治倾向。我们的方法是从人口普查人口统计资料,婴儿名字的趋势以及代表班级标签的Twitter帐户衍生的各种软约束中学习的。为了抵消这种约束的不精确性,我们比较了几种约束选择算法,这些算法可以优化调整集上的分类精度。我们发现,在不使用用户注释数据的情况下,我们的方法在四个任务中的三个任务的完全监督基线的2%以内。使用少量带标签的数据进行调整可以进一步提高所有任务的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号