首页> 外文期刊>Language Resources and Evaluation >The Corpus of American Danish: a language resource of spoken immigrant Danish in North and South America
【24h】

The Corpus of American Danish: a language resource of spoken immigrant Danish in North and South America

机译:美国丹麦语的语料:北美和南美洲丹麦语的语言资源

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes the 'Corpus of American Danish' (CoAmDa), a newly established corpus of spoken immigrant Danish in North and South America. The CoAmDa amounts to approx. 1.7 million tokens, making it one of the largest corpora of heritage language at present. With regard to text type, the CoAmDa is a non-standard multilingual spoken language resource as Danish is mixed with American English, Canadian English or Argentine Spanish, respectively, in every recording. The aim of this note is to document relevant aspects and specifications of the CoAmDA, viz. the audio data, the sociodemographic metadata of the speakers, the digitization process of analog data, the transcription procedures, the format and tagging of the speech files and the internal validation procedures. In so doing, we wish to share our experience and best practices with regard to achieving a spoken language resource of high quality with the interested public, in particular other researchers working on and with multilingual speech corpora.
机译:本文介绍了北美和南美洲的新成熟的移民丹麦语的“美国丹麦人”(Coamda)。柯达数量约为。 170万令牌,目前成为遗产语言最大的公司之一。关于文本类型,Coamda是一种非标准的多语言语言资源,因为丹麦语分别与美国英语,加拿大英语或阿根廷西班牙语混合在每次录音中。本说明的目的是记录Coamda,Viz的相关方面和规格。音频数据,扬声器的社会阶段元数据,模拟数据的数字化过程,转录过程,语音文件的格式和标记和内部验证过程。在这样做的过程中,我们希望与感兴趣的公众,特别是其他研究人员和多语言演讲语料库一起分享我们的经验和最佳实践。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号