首页> 外文会议>International conference on computational linguistics >Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment
【24h】

Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment

机译:为什么Tweets的性别和年龄预测很难:来自众群实验的课程

获取原文

摘要

There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speakers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.
机译:自动预测来自文本的作者的性别和年龄越来越感兴趣。 然而,大多数研究到目前为止忽略了语言使用与发言者的社会形式有关,这可能与他们的生物身份不同。 在本文中,我们将社会语言学的见解与通过在线游戏收集的数据相结合,强调了接近年龄和性别作为社会变量而不是静态生物变量的重要性。 在我们的游戏中,成千上万的玩家猜测了基于推文的Twitter用户的性别和年龄。 我们表明,超过10%的推特用户不使用人群与其生物性别的语言。 还表明,旧的Twitter用户经常被认为是年轻人。 我们的研究结果突出了目前对文本的性别和年龄预测方法的局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号