首页> 外文会议>Linguistic annotation workshop >The making of the Litkey Corpus, a richly annotated longitudinal corpus of German texts written by primary school children
【24h】

The making of the Litkey Corpus, a richly annotated longitudinal corpus of German texts written by primary school children

机译:LITKEY语料库的制作,是小学生撰写的德国文本的丰富注释的纵向语料库

获取原文

摘要

To date, corpus and computational linguistic work on written language acquisition has mostly dealt with second language learners who have usually already mastered orthography acquisition in their first language. In this paper, we present the Litkey Corpus, a richly-annotated longitudinal corpus of written texts produced by primary school children in Germany from grades 2 to 4. The paper focuses on the (semi-)automatic annotation procedure at various linguistic levels, which include POS tags, features of the word-internal structure (phonemes, syllables, morphemes) and key orthographic features of the target words as well as a categorization of spelling errors. Comprehensive evaluations show that high accuracy was achieved on all levels, making the Litkey Corpus a useful resource for corpus-based research on literacy acquisition of German primary school children and for developing NLP tools for educational purposes.
机译:迄今为止,书面语言习得的语料库和计算语言学工作主要处理了第二语言学习者,他们通常已经以他们的第一语言掌握了正射术习得。在本文中,我们介绍了Litkey语料库,这是由德国的小学儿童生产的丰富注释的纵向语料库,从等级2至4。本文重点介绍(半)自动注释程序,在各种语言水平上包括POS标签,单词内部结构的功能(音素,音节,语素)和目标单词的关键正交功能以及拼写错误的分类。综合评价表明,在各级实现高精度,使LITKey语料库成为基于语料库的有用资源,用于德国小学儿童的识字习题和开发NLP工具的教育目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号