首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >Augmenting Semantic Representation of Depressive Language: From Forums to Microblogs
【24h】

Augmenting Semantic Representation of Depressive Language: From Forums to Microblogs

机译:增强压抑语言的语义表示:从论坛到微博

获取原文

摘要

We discuss and analyze the process of creating word embedding feature representations specifically designed for a learning task when annotated data is scarce, like depressive language detection from Tweets. We start from rich word embedding pre-trained from a general dataset, then enhance it with embedding learned from a domain specific but relatively much smaller dataset. Our strengthened representation portrays better the domain of depression we are interested in as it combines the semantics learned from the specific domain and word coverage from the general language. We present a comparative analyses of our word embedding representations with a simple bag-of-words model, a well known sentiment lexicon, a psycholinguistic lexicon, and a general pre-trained word embedding, based on their efficacy in accurately identifying depressive Tweets. We show that our representations achieve a significantly better F1 score than the others when applied to a high quality dataset.
机译:我们讨论并分析了在注释数据稀缺时(例如从推文中检测到压抑性语言)为学习任务而专门设计的单词嵌入特征表示的过程。我们从从通用数据集中预训练的富词嵌入开始,然后通过从特定于领域但相对较小的数据集中学习到的嵌入来增强它。我们增强的表示法更好地描绘了我们感兴趣的抑郁症领域,因为它结合了从特定领域中学习到的语义和从通用语言中获得的单词覆盖率。我们基于简单的词袋模型,知名的情感词典,心理语言词典和一般的预训练单词嵌入,对它们的词嵌入表示形式进行比较分析,基于它们在准确识别抑郁性推文中的功效。我们表明,当将这些表示应用于高质量数据集时,它们的F1得分明显优于其他表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号