首页> 外文期刊>Future generation computer systems >On the use of distributed semantics of tweet metadata for user age prediction
【24h】

On the use of distributed semantics of tweet metadata for user age prediction

机译:关于将推文元数据的分布式语义用于用户年龄预测

获取原文
获取原文并翻译 | 示例
       

摘要

Social media data represent an important resource for behavioral analysis of the aging population. This paper addresses the problem of age prediction from Twitter dataset, where the prediction issue is viewed as a classification task. For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata. More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets. We also employ distributed representations of words and phrases present in tweets, hashtags and URLs, pre-trained on appropriate corpora in order to exploit their semantic information in age prediction. We show that our CNN-based classifier, when compared with baseline models, yields an improvement of up to 12.3% for Dutch dataset, 9.8% for English1 dataset. and 6.6% for English2 dataset in the micro-averaged F1 score. (C) 2019 The Authors. Published by Elsevier B.V.
机译:社交媒体数据代表了人口老龄化行为分析的重要资源。本文解决了来自Twitter数据集的年龄预测问题,该预测问题被视为分类任务。为此,设计了一种基于卷积神经网络的创新模型。为此,我们依靠与语言相关的功能和特定于社交媒体的元数据。更具体地说,我们介绍了文献中以前未曾考虑过的两个功能:URL的内容和出现在推文中的主题标签。我们还采用了在推文,主题标签和URL中存在的单词和短语的分布式表示形式,并在适当的语料库上进行了预训练,以便在年龄预测中利用其语义信息。我们显示,与基线模型相比,基于CNN的分类器对荷兰数据集的改进高达12.3%,对英语1数据集的改进高达9.8%。微型平均F1分数中的English2数据集为6.6%。 (C)2019作者。由Elsevier B.V.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号