首页> 外文会议>Conference on Empirical Methods in Natural Language Processing >X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

【24h】

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

机译：X因子：来自预磨料语言模型的多语种事实知识检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as "Punta Cana is located in _." However, while knowledge is both written and queried in many languages, studies on LMs' factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for 23 typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages.

机译：语言模型（LMS）已经证明，通过完成“Punta Cana位于_”等隐藏式填充空白问题，令人惊讶地成功地捕捉事实知识。然而，虽然知识既有许多语言编写和询问，但对LMS的事实代表能力的研究几乎总是对英语进行的。为了评估不同语言的LMS中的事实知识检索，我们为23种类型的不同语言创建了一个多语言基准。为了正确处理语言变化，我们将探测方法从单个到多字实体扩展，并开发几种解码算法以生成多令牌预测。广泛的实验结果提供了关于当前最先进的LMS在此任务中的识别有更多或更少的可用资源。我们进一步提出了一种基于代码切换的方法来提高多语种LMS访问知识的能力，并验证其对多种基准语言的效果。

著录项

来源
《Conference on Empirical Methods in Natural Language Processing 》|2020年|5943-5959|共17页
会议地点
作者
Zhengbao Jiang; Antonios Anastasopoulos; Jun Araki; Haibo Ding; Graham Neubig;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Multilingual information retrieval in the language modeling framework [J] . Rahimi Razieh, Shakery Azadeh, King Irwin Information retrieval . 2015 ,第3期

机译：语言建模框架中的多语言信息检索
2. Using Communities of Words Derived from Multilingual Word Vectors for Cross-Language Information Retrieval in Indian Languages [J] . Bhattacharya Paheli, Goyal Pawan, Sarkar Sudeshna ACM transactions on Asian language information processing . 2019 ,第1期

机译：使用多语言单词向量衍生的单词社区进行印度语言的跨语言信息检索
3. Spoken Proper Name Retrieval for Limited Resource Languages Using Multilingual Hybrid Representations [J] . Akbacak M., Hansen J.H.L. Audio, Speech, and Language Processing, IEEE Transactions on . 2010 ,第6期

机译：使用多语言混合表示法检索有限资源语言的专有名称
4. Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models [C] . Nora Kassner, Philipp Dufter, Hinrich Schuetze Conference of the European Chapter of the Association for Computational Linguistics . 2021

机译：多语种喇嘛：调查多语种预留语言模型的知识
5. Planning language: Competing models of education in multilingual countries with English as a second language. [D] . Raddaoui, Ali Hechmi. 1988

机译：计划语言：英语为第二语言的多语言国家教育模式的竞争。
6. Controlled Vocabularies Indexing and Medical Language Processing. Medical Language Processing: Medical Language Processing for Knowledge Representation and Retrievals [O] . Margaret Lyman, Naomi Sager, Emile C. Chi, 1989

机译：受控词汇表索引编制和医学语言处理。医学语言处理：用于知识表示和检索的医学语言处理
7. Developing knowledge on languages and cultures in intercomprehensive multilingual chatrooms: which role in the enhancement of multilingual education? [O] . Espinha Ângela, Araújo e Sá Maria Helena, De Carlo Maddalena 2017

机译：在跨语言的多语种聊天室中发展有关语言和文化的知识：在增强多语种教育中发挥什么作用？

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

摘要

著录项

相似文献

相关主题

期刊订阅