首页> 外文会议>Conference of Open Innovations Association >Named Entity Recognition in Spanish Biomedical Literature: Short Review and Bert Model
【24h】

Named Entity Recognition in Spanish Biomedical Literature: Short Review and Bert Model

机译:西班牙生物医学文献中的命名实体识别:简短回顾和伯特模型

获取原文

摘要

Entity Recognition (NER) is the first step for knowledge acquisition when we deal with an unknown corpus of texts. Having received these entities, we have an opportunity to form parameters space and to solve problems of text mining as concept normalization, speech recognition, etc. The recent advances in NER are related to the technology of contextualized word embeddings, which transforms text to the form being effective for Deep Learning. In the paper, we show how NER model detects pharmacological substances, compounds, and proteins in the dataset obtained from the Spanish Clinical Case Corpus (SPACCC). To achieve this goal, we train from scratch the BERT language representation model and fine-tune it for our problem. As it is expected, this model shows better results than the NER model trained over the standard word embeddings. We further conduct an error analysis showing the origins of models' errors and proposing strategies to further improve the model's quality.
机译:当我们处理未知的文本语料库时,实体识别(NER)是知识获取的第一步。收到这些实体后,我们就有机会形成参数空间并解决文本挖掘的问题,例如概念归一化,语音识别等。NER的最新进展与上下文化词嵌入技术有关,后者将文本转换为形式对深度学习有效。在本文中,我们展示了NER模型如何检测从西班牙临床案例语料库(SPACCC)获得的数据集中的药理物质,化合物和蛋白质。为了实现这个目标,我们从头开始训练BERT语言表示模型,并针对我们的问题进行微调。不出所料,该模型显示出比通过标准单词嵌入训练的NER模型更好的结果。我们进一步进行错误分析,以显示模型错误的根源,并提出进一步提高模型质量的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号