首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content
【24h】

Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content

机译:用于在用户生成的内容中检测药物不良反应的训练数据增强

获取原文

摘要

Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semi-supervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and re-training costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data.
机译:社交媒体为药物不良反应(ADR)检测提供了及时而具有挑战性的数据源。现有的基于字典的半监督学习方法本质上受到外行健康词汇的覆盖范围和可维护性的限制。在本文中,我们介绍了一种数据增强方法,该方法利用变分自动编码器从一个大型的未标记数据集中学习高质量的数据分布,然后从一小组标记的样本中自动生成一个大型的标记训练集。这允许以低培训和再培训成本进行有效的社交媒体ADR检测,以适应非正式医疗外行条款的变化和出现。在Twitter和Reddit数据上进行的广泛评估表明,我们的方法与完全监督的方法的性能相匹配,而仅需要25%的训练数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号