首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
【24h】

Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation

机译:具有上下文和非上下文子词表示形式的序列标记:多语言评估

获取原文

摘要

Pretrained contextual and non-contextual sub-word embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works well across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual sub-word embeddings in a low-resource setting.
机译:预训练的上下文和非上下文子词嵌入已支持250多种语言,从而允许使用大量的多语言NLP。但是,尽管没有预训练的嵌入,但由于缺乏系统的评估,从业人员很难在它们之间进行选择。在这项工作中,我们进行了广泛的评估,比较了非上下文子词嵌入(即FastText和BPEmb)和上下文表示方法(即BERT)在多语言命名实体识别和词性标记上的作用。我们发现,总体而言,BERT,BPEmb和字符表示的组合在各种语言和任务中均能很好地工作。更详细的分析揭示了不同的优缺点:多语言BERT在中高资源语言中表现良好,但在低资源环境中的非上下文子词嵌入效果优于。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号