A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations

Hao Wei; Mingyuan Gao; Ai Zhou; Fei Chen; Wen Qu; Yijia Zhang; Mingyu Lu

首页> 外文期刊>Wireless communications & mobile computing >A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations

【24h】

A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations

机译：基于多任务学习和上下文化字表示的多通道生物医学名为实体识别模型

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the biomedical literature increases exponentially, biomedical named entity recognition (BNER) has become an important task in biomedical information extraction. In the previous studies based on deep learning, pretrained word embedding becomes an indispensable part of the neural network models, effectively improving their performance. However, the biomedical literature typically contains numerous polysemous and ambiguous words. Using fixed pretrained word representations is not appropriate. Therefore, this paper adopts the pretrained embeddings from language models (ELMo) to generate dynamic word embeddings according to context. In addition, in order to avoid the problem of insufficient training data in specific fields and introduce richer input representations, we propose a multitask learning multichannel bidirectional gated recurrent unit (BiGRU) model. Multiple feature representations (e.g., word-level, contextualized word-level, character-level) are, respectively, or collectively fed into the different channels. Manual participation and feature engineering can be avoided through automatic capturing features in BiGRU. In merge layer, multiple methods are designed to integrate the outputs of multichannel BiGRU. We combine BiGRU with the conditional random field (CRF) to address labels’ dependence in sequence labeling. Moreover, we introduce the auxiliary corpora with same entity types for the main corpora to be evaluated in multitask learning framework, then train our model on these separate corpora and share parameters with each other. Our model obtains promising results on the JNLPBA and NCBI-disease corpora, with F1-scores of 76.0% and 88.7%, respectively. The latter achieves the best performance among reported existing feature-based models.

机译：随着生物医学文献呈指数增长，生物医学命名实体识别（BNER）已成为生物医学信息提取的重要任务。在以前的基于深度学习的研究中，预先灌注的单词嵌入成为神经网络模型不可或缺的一部分，有效提高了它们的性能。然而，生物医学文献通常包含许多多殖民和含糊不清的词。使用固定的预磨词表示不合适。因此，本文采用语言模型（ELMO）的预染色嵌入，根据上下文生成动态字嵌入。另外，为了避免特定领域中训练数据不足的问题并引入更丰富的输入表示，我们提出了一种多址学习多通道双向门控复发单元（BigRU）模型。多个特征表示（例如，单词级，语境化词级，字符级别）分别或集体馈入不同的信道。通过在Bigru中的自动捕获功能，可以避免手动参与和特征工程。在合并层中，旨在集成多声道Bigru的输出。我们将Bigru与条件随机字段（CRF）组合以解决标签在序列标记中的依赖性。此外，我们介绍了在多任务学习框架中进行评估的主语料的同一实体类型的辅助Corpora，然后在这些单独的语料库上培训我们的模型，并互相共享参数。我们的模型在JNLPBA和NCBI-Discult Corpora上获得了有希望的结果，分别具有76.0％和88.7％的F1分。后者达到了报告的基于功能的模型中的最佳性能。

著录项

来源
《Wireless communications & mobile computing》 |2020年第1期|共13页
作者
Hao Wei; Mingyuan Gao; Ai Zhou; Fei Chen; Wen Qu; Yijia Zhang; Mingyu Lu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multitask learning for biomedical named entity recognition with cross-sharing structure [J] . Xi Wang, Jiagao Lyu, Li Dong, BMC Bioinformatics . 2019,第1期

机译：具有交叉共享结构的生物医学命名实体识别的多任务学习
2. Multitask learning for biomedical named entity recognition with cross-sharing structure [J] . Xi Wang, Jiagao Lyu, Li Dong, BMC Bioinformatics . 2019,第1期

机译：具有交叉共享结构的生物医学命名实体识别的多任务学习
3. Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations [J] . Tsendsuren Munkhdalai, Meijing Li, Khuyagbaatar Batsuren, Journal of Cheminformatics . 2015,第S1期

机译：将领域知识与单词表示法结合到化学和生物医学命名实体识别中
4. Biomedical Named-Entity Recognition by Hierarchically Fusing BioBERT Representations and Deep Contextual-Level Word-Embedding [C] . Usman Naseem, Katarzyna Musial, Peter Eklund, International Joint Conference on Neural Networks . 2020

机译：通过分层融合BioBERT表示和深度上下文级别词嵌入的生物医学命名实体识别
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Multitask learning for biomedical named entity recognition with cross-sharing structure [O] . Xi Wang, Jiagao Lyu, Li Dong, 2019

机译：具有交叉共享结构的生物医学命名实体识别的多任务学习
7. Multitask learning for biomedical named entity recognition with cross-sharing structure [O] . Xi Wang, Jiagao Lyu, Li Dong, 2019

机译：与交叉共享结构的生物医学命名实体识别的多址学习

A Multichannel Biomedical Named Entity Recognition Model Based on Multitask Learning and Contextualized Word Representations

摘要

著录项

相似文献

相关主题

期刊订阅