首页> 外文会议>Workshop on NLP for computer-assisted language learning >You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language
【24h】

You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language

机译:你得到了你的注释:瑞典语刻上的课程专业注释语料库作为第二语言

获取原文

摘要

We present the COCTAILL corpus, containing over 700.000 tokens of Swedish texts from 12 coursebooks aimed at second/foreign language (L2) learning. Each text in the corpus is labelled with a proficiency level according to the CEFR proficiency scale. Genres, topics, associated activities, vocabulary lists and other types of information are annotated in the coursebooks to facilitate Second Language Acquisition (SLA)-aware studies and experiments aimed at Intelligent Computer-Assisted Language Learning (ICALL). Linguistic annotation in me form of parts-of-speech (POS; e.g. nouns, verbs), base forms (lemmas) and syntactic relations (e.g. subject, object) has been also added to the corpus. In the article we describe our annotation scheme and the editor we have developed for the content mark-up of the coursebooks, including the taxonomy of pedagogical activities and linguistic skills. Inter-annotator agreement has been computed and reported on a subset of the corpus. Surprisingly, we have not found any other examples of pedagogically marked-up corpora based on L2 coursebooks to draw on existing experiences. Hence, our work may be viewed as "groping in the darkness" and eventually a starting point for others. The paper also presents our first quantitative exploration of the corpus where we focus on textually and pedagogically annotated features of the coursebooks to exemplify what types of studies can be performed using the presented annotation scheme. We explore trends shown in use of topics and genres over proficiency levels and compare pedagogical focus of exercises across levels. The final section of the paper summarises the potential this corpus holds for research within SLA and various ICALL tasks.
机译:我们介绍了Coctaill Corpus,其中含有超过12道菜簿的瑞典文本超过700,000令牌,针对二次/外语(L2)学习。语料库中的每个文本都按照CEFR熟结规模标记熟练程度。在课簿中注释了流派,主题,相关活动,词汇表和其他类型的信息,以促进旨在智能计算机辅助语言学习(ICALL)的第二语言采集(SLA)--AWARE研究和实验。在语料库中也已添加语料库中的语音(POS;例如,名词,动词),基础形式(LEMMAS)和句法关系的语言形式的语言诠释。在文章中,我们描述了我们为课簿的内容标记开发的注释计划和编辑,包括教学活动和语言技能的分类。在语料库的子集上计算并报告了互连间协议。令人惊讶的是,我们没有根据L2课簿找到任何其他关于教学标记的模板的例子,以借鉴现有的经验。因此,我们的工作可能被视为“在黑暗中摸索”,最终是他人的起点。本文还提出了我们对语料库的第一次定量探索,我们专注于课簿的文本和教育注释特征,以举例说明可以使用所呈现的注释方案进行的研究类型。我们探讨了在熟练程度上使用主题和类型的趋势,并比较跨越练习的教学焦点。本文的最后一部分总结了该语料库持有SLA和各种ICALL任务的潜在潜在持有。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号