首页> 外文会议>LREC-2012 >A Phonemic Corpus of Polish Child-Directed Speech
【24h】

A Phonemic Corpus of Polish Child-Directed Speech

机译:波兰儿童导向语音的音素语料库

获取原文

摘要

Recent advances in modeling early language acquisition are due not only to the development of machine-learning techniques, but also to the increasing availability of data on child language and child-adult interaction. In the absence of recordings of child-directed speech, or when models explicitly require such a representation for training data, phonemic transcriptions are commonly used as input data. We present a novel (and to our knowledge, the first) phonemic corpus of Polish child-directed speech. It is derived from the Weist corpus of Polish, freely available from the seminal CHILDES database. For the sake of reproducibility, and to exemplify the typical trade-off between ecological validity and sample size, we report all preprocessing operations and transcription guidelines. Contributed linguistic resources include updated CHAT-formatted transcripts with phonemic transcriptions in a novel phonology tier, as well as by-product data, such as a phonemic lexicon of Polish. All resources are distributed under the LGPL-LR license.
机译:建模早期语言习得的最新进展不仅是由于机器学习技术的发展,而且还归因于增加儿童语言和儿童成人互动的数据的增加。在没有儿童定向的语音记录的情况下,或者当模型明确要求训练数据的这种表示时,音素转录通常用作输入数据。我们提出了一部小说(以及我们的知识,第一个)波兰儿童导向的言语的音素语料库。它是从精灵数据库自由购买的波兰语杂草语料库。为了再现性,并举例说明生态有效性和样本大小之间的典型权衡,我们报告了所有预处理运营和转录指南。贡献的语言资源包括更新的聊天格式的成格式转录物,其中具有新颖的音韵层中的音素转录,以及副产品数据,例如波兰语的音素词典。所有资源都在LGPL-LR许可证下分发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号