Investigating Language Impact in Bilingual Approaches for Computational Language Documentation

机译：在计算语言文档的双语方法中调查语言影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work and potential automatic approaches which will work on top of the produced bilingual corpus. For answering this question, we use the MaSS multilingual speech corpus (Boito et al., 2020) for creating 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results highlight that the choice of language for translation influences the word segmentation performance, and that different lexicons are learned by using different aligned translations. Lastly, this paper proposes a hybrid approach for bilingual word segmentation, combining boundary clues extracted from a non-parametric Bayesian model (Goldwater et al., 2009a) with the attentional word segmentation neural model from Godard et al. (2018). Our results suggest that incorporating these clues into the neural models' input representation increases their translation and alignment quality, specially for challenging language pairs.

机译：对于濒临灭绝的语言，数据收集运动必须应对许多语言都来自口头传统的挑战，并且产生转录本的成本很高。因此，将它们翻译成广泛使用的语言以确保录音的可解释性是至关重要的。在本文中，我们研究了翻译语言的选择如何影响后验文档工作以及潜在的自动方法，这些方法将在产生的双语语料库上起作用。为了回答这个问题，我们使用MaSS多语言语音语料库（Boito等人，2020）创建了56对双语对，这些对适用于低资源无监督单词分割和对齐的任务。我们的结果表明，翻译语言的选择会影响分词性能，并且通过使用不同的对齐翻译可以学习不同的词典。最后，本文提出了一种混合的双语分词方法，将非参数贝叶斯模型（Goldwater等，2009a）中提取的边界线索与Godard等人的注意分词神经模型相结合。（2018）。我们的结果表明，将这些线索整合到神经模型的输入表示中可以提高其翻译和对齐质量，特别是对于具有挑战性的语言对而言。

著录项

来源
《Joint Spoken Language Technolologies for Under-resourcd Languages and Collaboration and Computing for Under-Resourced Languages Workshop》|2020年|79-87|共9页
会议地点
作者
Marcely Zanon Boito; Aline Villavicencio; Laurent Besacier;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
word segmentation; sequence-to-sequence models; computational language documentation; attention mechanism;

机译：分词序列到序列模型;计算语言文档;注意机制;

相似文献

外文文献
中文文献
专利

1. Bilingualism beyond languages: The impact of bilingualism upon the brain. Comment on "The bilingual brain: Flexibility and control in the human cortex" by Buchweitz and Prat. (Note) [J] . Abutalebi J. Physics of life reviews . 2013,第4期

机译：语言之外的双语：双语对大脑的影响。 Buchweitz和Prat对“双语大脑：人类皮质的灵活性和控制力”的评论。（注意）
2. Two Languages in Mind: Bilingualism as a Tool to Investigate Language, Cognition, and the Brain [J] . Judith F. Kroll, Susan C. Bobb, Noriko Hoshino Current directions in psychological science: a journal of the American Psychological Society . 2014,第3期

机译：两种语言介意：双语作为研究语言，认知和大脑的工具
3. Telling stories in two languages: multiple approaches to understanding English-Japanese bilingual children's narratives [J] . Shogo Sakurai International Journal of Bilingual Education and Bilingualism . 2012,第5期

机译：用两种语言讲故事：理解英日双语儿童叙述的多种方法
4. Bilingual Team Writing: How One Company is Meeting the Demands of Simultaneous Software and Documentation Release in Multiple Languages [C] . Gerald J. Duffy STC 41st annual conference . 1994

机译：双语团队写作：一家公司如何满足同时使用多种语言发布软件和文档的需求
5. Science as a second language: Analysis of Emergent Bilinguals performance and the impact of English language proficiency and first language characteristics on the Colorado measures of academic success for science. [D] . Bruno, Joanna K. 2016

机译：科学作为第二语言：分析紧急情况下的双语人员的表现以及英语水平和第一语言特征对科罗拉多科学界学术成就的影响。
6. Two languages in mind: Bilingualism as a tool to investigate language cognition and the brain [O] . Judith F. Kroll, Susan C. Bobb, Noriko Hoshino -1

机译：注意两种语言：双语作为研究语言认知和大脑的工具
7. Language Experience Impacts Brain Activation for Spoken and Signed Language in Infancy: Insights From Unimodal and Bimodal Bilinguals [O] . Evelyne Mercure, Samuel Evans, Laura Pirazzoli, 2020

机译：语言体验会影响婴儿婴儿的口语和签署语言的大脑激活：从无看和双模双语的见解

Investigating Language Impact in Bilingual Approaches for Computational Language Documentation

摘要

著录项

相似文献

相关主题

期刊订阅