首页> 外文会议>International joint conference on natural language processing >Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content
【24h】

Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content

机译:用于检测用户生成内容中的不良药物的数据增强

获取原文

摘要

Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semi-supervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and re-training costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data.
机译:社交媒体为不良药物反应(ADR)检测提供了及时但具有挑战性的数据源。基于现有的字典的半监督学习方法是由行李卫生词汇表的覆盖率和可维护性的本质上限制。在本文中,我们介绍了一种数据增强方法,利用变化自动沉积器从大型未标记的数据集中学习高质量的数据分布,然后自动生成从一小组标记的样本集的大型标记训练。这允许高效的社交媒体ADR检测,具有低培训和重新培训成本,以适应非正式医学行长术语的变化和出现。在Twitter和Reddit数据上执行的广泛评估表明,我们的方法与完全监督方法的性能相匹配,同时只需要25%的培训数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号