首页> 外文期刊>Bioinformatics >Transfer learning for biomedical named entity recognition with neural networks
【24h】

Transfer learning for biomedical named entity recognition with neural networks

机译:与神经网络的生物医学命名实体识别的转移学习

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: The explosive increase of biomedical literature has made information extraction an increasingly important tool for biomedical research. A fundamental task is the recognition of biomedical named entities in text (BNER) such as genes/proteins, diseases and species. Recently, a domain-independent method based on deep learning and statistical word embeddings, called long short-term memory network-conditional random field (LSTM-CRF), has been shown to outperform state-of-the-art entity-specific BNER tools. However, this method is dependent on gold-standard corpora (GSCs) consisting of hand-labeled entities, which tend to be small but highly reliable. An alternative to GSCs are silver-standard corpora (SSCs), which are generated by harmonizing the annotations made by several automatic annotation systems. SSCs typically contain more noise than GSCs but have the advantage of containing many more training examples. Ideally, these corpora could be combined to achieve the benefits of both, which is an opportunity for transfer learning. In this work, we analyze to what extent transfer learning improves upon state-of-the-art results for BNER.
机译:动机:生物医学文献的爆炸性增加使信息提取了越来越重要的生物医学研究工具。基本任务是识别文本(BNER)中的生物医学命名实体,例如基因/蛋白,疾病和物种。最近,已经示出了一种基于深度学习和统计单词嵌入的域的独立方法,称为长短短期内存网络条件随机字段(LSTM-CRF),已经显示为优于最先进的实体特定的BNER工具。然而,这种方法取决于由手工标记实体组成的金标准(GSC),这往往很小但高度可靠。 GSC的替代方案是银标准的语料库(SSCS),它是通过协调多个自动注释系统所制作的注释而产生的。 SSCs通常包含比GSC更多的噪声,但具有包含更多培训示例的优点。理想情况下,这些公司可以组合以实现两人的好处,这是一个转移学习的机会。在这项工作中,我们分析了转移学习在多大程度上提高了最先进的竞争的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号