首页> 外文OA文献 >Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents
【2h】

Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents

机译:在金融和生物医学文件中转移学习的名称实体识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent deep learning approaches have shown promising results for named entity recognition (NER). A reasonable assumption for training robust deep learning models is that a sufficient amount of high-quality annotated training data is available. However, in many real-world scenarios, labeled training data is scarcely present. In this paper we consider two use cases: generic entity extraction from financial and from biomedical documents. First, we have developed a character based model for NER in financial documents and a word and character based model with attention for NER in biomedical documents. Further, we have analyzed how transfer learning addresses the problem of limited training data in a target domain. We demonstrate through experiments that NER models trained on labeled data from a source domain can be used as base models and then be fine-tuned with few labeled data for recognition of different named entity classes in a target domain. We also witness an interest in language models to improve NER as a way of coping with limited labeled data. The current most successful language model is BERT. Because of its success in state-of-the-art models we integrate representations based on BERT in our biomedical NER model along with word and character information. The results are compared with a state-of-the-art model applied on a benchmarking biomedical corpus.
机译:最近的深入学习方法已经显示了命名实体识别(NER)的有希望的结果。培训强大的深度学习模型的合理假设是有足够的高质量注释培训数据。但是,在许多真实世界的情景中,几乎没有存在标记的训练数据。在本文中,我们考虑两种用例:从金融和生物医学文件中提取通用实体提取。首先,我们在生物医学文档中为新的金融文档和基于字符和字符的模型开发了基于字符的型号。此外,我们已经分析了转移学习如何解决目标域中有限培训数据的问题。我们通过实验证明了从源域标记数据训练的NER模型可以用作基础模型,然后用几个标记的数据进行微调,以识别目标域中的不同命名实体类。我们还目睹了对语言模型的兴趣,以改善NER作为应对有限标记数据的方式。目前最成功的语言模型是伯特。由于它在最先进的模型中取得了成功,我们将基于BERT的伯特与单词和字符信息相结合。将结果与应用于基准生物医学语料库的最新模型进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号