首页> 外文会议>IEEE EMBS International Conference on Biomedical and Health Informatics >Uncertainty-based Self-training for Biomedical Keyphrase Extraction
【24h】

Uncertainty-based Self-training for Biomedical Keyphrase Extraction

机译:基于不确定性的生物医学关键正萃取的自我训练

获取原文
获取外文期刊封面目录资料

摘要

To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient concepts in a document. Various supervised approaches model keyphrase extraction using local context to predict the label for each token and perform much better than the unsupervised counterparts. However, existing supervised datasets have limited annotated examples to train better deep learning models. In contrast, many domains have large amount of un-annotated data that can be leveraged to improve model performance in keyphrase extraction. We introduce a self- learning based model that incorporates uncertainty estimates to select instances from large-scale unlabeled data to augment the small labeled training set. Performance evaluation on a publicly available biomedical dataset demonstrates that our method improves performance of keyphrase extraction over state of the art models.
机译:为了跟上速度的增加和数字化文件,可以改善庞大文学的搜查,发现和挖掘的自动化方法是必不可少的。密钥段通过识别文档中的突出概念提供简洁的表示。各种监督方法模型使用本地上下文提取关键正文提取以预测每个令牌的标签,并且比无监督的对应物更好地执行。但是,现有的监督数据集具有有限的注释示例,以培训更好的深度学习模型。相比之下,许多域具有大量的未注释数据,可以利用以改善关键斑提取中的模型性能。我们介绍了一种基于自学习的模型,该模型包含不确定性估计,以从大规模未标记数据中选择实例,以增加小标记的训练集。公开的生物医学数据集的性能评估表明,我们的方法提高了关键词提取的关键型号的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号