首页> 外文会议>Health intelligence and medicine >Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses
【24h】

Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses

机译:词嵌入技术的评估,以识别与用药有关的个人经验推文

获取原文

摘要

Twitter, a general purpose social media service, has seen growing interests as an active data source for possible use of post-market surveillance of medicinal products. Being able to identify Twitter posts of personal experience related to medication use is as important as being able to identify expressions of adverse medical events/reactions for the surveillance purpose. Identifying personal experience tweets is a challenging task, especially in the aspect of engineering features for classification. Word embedding has become a superior alternative to engineered features in many text classification applications. To investigate if word embedding-based methods can perform constantly better than conventional classification methods with engineered features, we assessed the classification performance of 4 word embedding techniques: GloVe, word2vec, fastText, and wordRank. Using a corpus of 22 million unlabeled tweets for learning of word embedding and a corpus of 12,331 annotated tweets for classification, we discovered that word embedding-based classification methods consistently outperform the engineered feature-based classification methods with statistical significance of p < 0.01, but there exist no significantly statistical differences among the 4 study word embedding methods (p < 0.05).
机译:Twitter是一种通用的社交媒体服务,它已经将人们的兴趣作为一种活跃的数据源,可以用于对药品进行售后监控。能够识别与药物使用相关的个人经历的Twitter帖子与能够识别不良医学事件/反应的表达以进行监视一样重要。识别个人经验推文是一项艰巨的任务,尤其是在用于分类的工程特征方面。在许多文本分类应用程序中,单词嵌入已成为工程功能的替代方案。为了研究基于单词嵌入的方法是否能够比具有工程特征的常规分类方法持续更好地进行性能评估,我们评估了4种单词嵌入技术的分类性能:GloVe,word2vec,fastText和wordRank。使用2200万个未标记的推文的语料库进行单词嵌入的学习和12331个带注释的推文的语料库进行分类,我们发现基于单词嵌入的分类方法始终优于基于工程特征的分类方法,其统计显着性为p <0.01,但是四种研究词嵌入方法之间无显着统计学差异(p <0.05)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号