首页> 外文期刊>Journal of the American Medical Informatics Association : >Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task
【24h】

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

机译:Twitter的药物相关文本分类和概念标准化的数据和系统:来自社交媒体挖掘的洞察力(SMM4H) - 2017年共享任务

获取原文
获取原文并翻译 | 示例
       

摘要

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-elated text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data onsisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 nstances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system ombinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational ystems relying on difficult text classification tasks (eg, subtask-1).Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies
机译:目的:我们执行了社交媒体挖掘健康(SMM4H)2017年共享任务,以使社区驱动的开发和对社交媒体健康宣传文本的分类和标准化的自动文本处理方法的大规模评估。额外目标是公开发布手动注释的数据。材料和方法:我们组织了3个独立的子特征:自动分类为1)不良药物反应(ADR)和2)药物消费,从药物提及推特和3)ADR表达的标准化。有关(1),10 260为(2),6650个ADR短语和标识符的培训数据(1),10 260的推荐数据(3);并表现出基于社交媒体的健康相关文本的典型特性。分别使用9961,7513和2500个不用于3个子任务进行评估的系统。我们在共享任务之后评估了系统组合的类别和集合的表现。结果:55系统运行中,3个子任务的最佳系统分数为子任务-1,0.693(两类微平均F1分数)的0.435(ADR类F1分数),用于子任务-2,88.5%( SubTask-3的准确性)。系统域的合奏获得了0.476,0.702和88.7%的最佳分数,表现优于个体系统。讨论:在各个系统中,支持向量机和卷积神经网络表现出高性能。通过系统组合的集合实现的性能增益表明,此类策略可能适用于依赖于困难的文本分类任务(例如,SubTask-1).Conclusions:数据不平衡和缺乏上下文对社交媒体文本的自然语言处理仍然挑战仍然挑战。来自共享任务的注释数据已作为未来研究的参考标准提供

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号