【24h】

A Named Entity Recognition Shootout for German

机译:德语的命名实体识别大战

获取原文

摘要

We ask how to practically build a model for German named entity recognition (NER) that performs at the state of the art for both contemporary and historical texts, i.e., a big-data and a small-data scenario. The two best-performing model families are pitted against each other (linear-chain CRFs and BiLSTM) to observe the trade-off between expressiveness and data requiremenis. BiLSTM outperforms the CRF when large datasets are available and performs inferior for the smallest dataset. BiLSTMs profit substantially from transfer learning, which enables them to be trained on multiple corpora, resulting in a new state-of-the-art model for German NER on two contemporary German corpora (CoNLL 2003 and GermEval 2014) and two historic corpora.
机译:我们询问如何为德国命名实体识别(NER)建立一个模型,该模型在当代和历史文本(即大数据和小数据方案)方面都处于最新状态。将两个性能最佳的模型系列相互抗衡(线性链CRF和BiLSTM),以观察表达能力和数据需求之间的权衡。当可获得较大的数据集时,BiLSTM的效果优于CRF,而对于最小的数据集,BiLSTM的效果却逊色于CRF。 BiLSTM从迁移学习中获得了可观的收益,这使他们能够接受多种语料库的培训,从而在两个当代德国语料库(CoNLL 2003和GermEval 2014)和两个历史性语料库上为德国NER建立了新的最新模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号