...
首页> 外文期刊>Expert Systems with Application >Early author profiling on Twitter using profile features with multi-resolution
【24h】

Early author profiling on Twitter using profile features with multi-resolution

机译:使用多分辨率个人资料功能在Twitter上进行早期作者分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The Author Profiling (AP) task aims to predict demographic characteristics about the authors from documents (e.g., age, gender, native language). The research so far has focused only on forensic scenarios by performing post-analysis using all the available text evidence. This paper introduces the task of Early Author Profiling (EAP) in Twitter. The goal is to effectively recognize profiles using as few tweets as possible from the user history. The task is highly relevant to support social media analysis and different problems related to security and marketing, where prevention and anticipation is crucial. This work proposes a novel strategy that combines a state of the art representation for early text classification and specialized word-vectors for author profiling tasks. In this strategy we build prototypical features called Profile based Meta-Words, which allow us to model AP information at different levels of granularity. Our evaluation shows that the proposed methodology is well suited for profiling little text evidence (e.g., a handful of tweets) in early stages, but as more tweets become available other granularities better encode larger amounts of text in late stages. We evaluated the proposed ideas on gender and language variety identification for English and Spanish, and showed that the proposal outperforms state of the art methodologies. (C) 2019 Elsevier Ltd. All rights reserved.
机译:作者分析(AP)任务旨在根据文档(例如年龄,性别,母语)预测作者的人口统计特征。迄今为止,该研究仅通过使用所有可用的文本证据执行后分析,仅将重点放在法医场景上。本文介绍了Twitter中的早期作者分析(EAP)的任务。目标是使用尽可能少的来自用户历史记录的推文来有效识别配置文件。这项任务与支持社交媒体分析以及与安全和营销有关的各种问题非常相关,在这些问题中,预防和预期至关重要。这项工作提出了一种新颖的策略,该策略结合了用于早期文本分类的最先进的表示方法和用于作者概要分析任务的专用词向量。在这种策略中,我们构建了称为基于配置文件的元词的原型功能,该功能使我们可以在不同的粒度级别上对AP信息进行建模。我们的评估表明,所提出的方法非常适合在早期阶段分析少量文本证据(例如,少数推文),但是随着可用的推文越多,其他粒度可以在后期更好地编码大量文本。我们评估了关于英语和西班牙语的性别和语言多样性识别的提议思想,并表明该提议优于最新的方法论。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号