...
首页> 外文期刊>Neurocomputing >TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter
【24h】

TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

机译:Twilbert:用于西班牙推特的预训练的深双向变压器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In recent years, the Natural Language Processing community have been moving from uncontextualized word embeddings towards contextualized word embeddings. Among these contextualized architectures, BERT stands out due to its capacity to compute bidirectional contextualized word representations. However, its competitive performance in English downstream tasks is not obtained by its multilingual version when it is applied to other languages and domains. This is especially true in the case of the Spanish language used in Twitter.In this work, we propose TWiLBERT, a specialization of BERT architecture both for the Spanish language and the Twitter domain. Furthermore, we propose a Reply Order Prediction signal to learn inter-sentence coherence in Twitter conversations, which improves the performance of TWilBERT in text classification tasks that require reasoning on sequences of tweets. We perform an extensive evaluation of TWilBERT models on 14 different text classification tasks, such as irony detection, sentiment analysis, or emotion detection. The results obtained by TWilBERT outperform the state-of-the-art systems and Multilingual BERT. In addition, we carry out a thorough analysis of the TWilBERT models to study the reasons of their competitive behavior. We release the pre-trained TWilBERT models used in this paper, along with a framework for training, evaluating, and fine-tuning TWilBERT models. (C) 2020 Elsevier B.V. All rights reserved.
机译:近年来,自然语言处理社区一直从未完成的单词嵌入到上下文化单词嵌入。在这些上下文化架构中,BERT由于其计算双向上下文化字表示的能力而脱颖而出。但是,当它应用于其他语言和域时,它的多语言版本无法获得英语下游任务的竞争性能。这在Twitter中使用的西班牙语的情况下尤其如此。在这项工作中,我们提出了Twilbert,这是西班牙语和Twitter域的BERT架构的专业化。此外,我们提出了一个回复订单预测信号,以了解Twitter对话中的句子间一致性,这提高了Twilbert在需要推理推荐的文本分类任务中的性能。我们在14个不同的文本分类任务中对Twilbert模型进行了广泛的评估,例如讽刺检测,情感分析或情绪检测。通过Twilbert获得的结果优于最先进的系统和多语言伯特。此外,我们对Twilbert模型进行了彻底的分析,以研究其竞争行为的原因。我们释放了本文使用的预先训练的Twilbert模型,以及培训,评估和微调斜视模型的框架。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第22期|58-69|共12页
  • 作者单位

    Univ Politecn Valencia VRAIN Valencian Res Inst Artificial Intelligence Cami Vera Sn Valencia 46022 Spain;

    Univ Politecn Valencia VRAIN Valencian Res Inst Artificial Intelligence Cami Vera Sn Valencia 46022 Spain;

    Univ Politecn Valencia VRAIN Valencian Res Inst Artificial Intelligence Cami Vera Sn Valencia 46022 Spain;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Contextualized Embedd ngs; Spanish; Twitter; TWilBERT;

    机译:Conteltualized Embedd NGS;西班牙语;Twitter;Twilbert;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号