首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models
【24h】

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

机译:X因子:来自预磨料语言模型的多语种事实知识检索

获取原文

摘要

Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as "Punta Cana is located in _." However, while knowledge is both written and queried in many languages, studies on LMs' factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for 23 typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages.
机译:语言模型(LMS)已经证明,通过完成“Punta Cana位于_”等隐藏式填充空白问题,令人惊讶地成功地捕捉事实知识。然而,虽然知识既有许多语言编写和询问,但对LMS的事实代表能力的研究几乎总是对英语进行的。为了评估不同语言的LMS中的事实知识检索,我们为23种类型的不同语言创建了一个多语言基准。为了正确处理语言变化,我们将探测方法从单个到多字实体扩展,并开发几种解码算法以生成多令牌预测。广泛的实验结果提供了关于当前最先进的LMS在此任务中的识别有更多或更少的可用资源。我们进一步提出了一种基于代码切换的方法来提高多语种LMS访问知识的能力,并验证其对多种基准语言的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号