...
首页> 外文期刊>Information >Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents
【24h】

Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents

机译:在财务和生物医学文档中进行转移学习以进行命名实体识别

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Recent deep learning approaches have shown promising results for named entity recognition (NER). A reasonable assumption for training robust deep learning models is that a sufficient amount of high-quality annotated training data is available. However, in many real-world scenarios, labeled training data is scarcely present. In this paper we consider two use cases: generic entity extraction from financial and from biomedical documents. First, we have developed a character based model for NER in financial documents and a word and character based model with attention for NER in biomedical documents. Further, we have analyzed how transfer learning addresses the problem of limited training data in a target domain. We demonstrate through experiments that NER models trained on labeled data from a source domain can be used as base models and then be fine-tuned with few labeled data for recognition of different named entity classes in a target domain. We also witness an interest in language models to improve NER as a way of coping with limited labeled data. The current most successful language model is BERT. Because of its success in state-of-the-art models we integrate representations based on BERT in our biomedical NER model along with word and character information. The results are compared with a state-of-the-art model applied on a benchmarking biomedical corpus.
机译:最近的深度学习方法已经显示了命名实体识别(NER)的有希望的结果。训练健壮的深度学习模型的合理假设是,有足够数量的高质量带注释的训练数据可用。但是,在许多实际场景中,几乎没有标签化的训练数据。在本文中,我们考虑两个用例:从财务文件和生物医学文档中提取通用实体。首先,我们开发了财务文档中NER的基于字符的模型和生物医学文档中NER的基于单词和字符的模型。此外,我们分析了转移学习如何解决目标领域中训练数据有限的问题。我们通过实验证明,在源域中标记数据上训练的NER模型可以用作基础模型,然后用少量标记数据进行微调,以识别目标域中不同的命名实体类。我们还看到了对语言模型的兴趣,以改善NER,以应对有限的标记数据。当前最成功的语言模型是BERT。由于其在最新模型中的成功,我们将基于BERT的表示以及单词和字符信息集成到了我们的生物医学NER模型中。将结果与应用于基准生物医学语料库的最新模型进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号