首页> 外文会议>Workshop on Computational Approaches to Linguistic Code-Switching >Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots
【24h】

Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots

机译:芝麻街的代码混合:对抗性多胶的曙光

获取原文

摘要

Multilingual models have demonstrated impressive cross-lingual transfer performance. However, test sets like XNLI are monolingual at the example level. In multilingual communities, it is common for polyglots to code-mix when conversing with each other. Inspired by this phenomenon, we present two strong black-box adversarial attacks (one word-level, one phrase-level) for multilingual models that push their ability to handle code-mixed sentences to the limit. The former (POLYGLOSS) uses bilingual dictionaries to propose perturbations and translations of the clean example for sense disambiguation. The latter (BUMBLEBEE) directly aligns the clean example with its translations before extracting phrases as perturbations. BUMBLEBEE has a success rate of 89.75% against XLM-R_(targe), bringing its average accuracy of 79.85 down to 8.18 on XNLI. Finally, we propose an efficient adversarial training scheme, Code-mixed Adversarial Training (CAT), that trains in the same number of steps as the original model. Even after controlling for the extra training data introduced, CAT improves model accuracy when the model is prevented from relying on lexical overlaps (+3.45), with a negligible drop (-0.15 points) in performance on the original XNLI test set. t-SNE visualizations reveal that CAT improves a model's language agnosticity.
机译:多语种模型已经表现出令人印象深刻的交叉传输性能。但是,像XNLI这样的测试集在示例级别是单声道。在多语言社区中,彼此交谈时,多胶剂是代码混合。灵感来自这种现象,我们为多语言模型提供了两个强烈的黑匣子对抗性攻击(一个单词级,一个短语级),以便将它们处理到极限的码混合句子的能力。前者(polygloss)使用双语词典来提出清洁歧义的清洁示例的扰动和翻译。后者(BumbleBee)在提取短语作为扰动之前直接对齐清洁示例。大黄蜂的成功率为89.75%,XLM-R_(Targe),将其平均准确性为79.85降至8.18的XNLI。最后,我们提出了一种有效的对抗训练计划,码混合的对抗性训练(CAT),该培训与原始模型相同的步骤。即使在控制额外培训数据后,CAT也会提高模型准确性,当防止依赖词汇重叠(+3.45)时,在原始XNLI测试集上具有可忽略的下降(-0.15点)。 T-SNE可视化揭示了猫改善了模型的语言不可知论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号