首页> 外文会议>International conference on computational linguistics >Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment
【24h】

Why Gender and Age Prediction from Tweets is Hard: Lessons from a Crowdsourcing Experiment

机译:为什么很难通过推文进行性别和年龄预测:众包实验的经验教训

获取原文

摘要

There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speakers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.
机译:从文本自动预测作者的性别和年龄的兴趣日益浓厚。但是,到目前为止,大多数研究都忽略了语言的使用与说话者的社会身份有关,这可能与他们的生物学身份有所不同。在本文中,我们将社会语言学的见识与通过在线游戏收集的数据相结合,以强调将年龄和性别作为社会变量而非静态生物变量的重要性。在我们的游戏中,成千上万的玩家仅凭推文就猜出了Twitter用户的性别和年龄。我们显示,超过10%的Twitter用户没有使用人群与其性生活相关联的语言。还表明,老的Twitter用户通常被认为是年轻的。我们的发现凸显了当前通过文本预测性别和年龄的方法的局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号