首页> 外文会议>3rd Workshop on NLP for computer-assisted language learning 2014 >You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language
【24h】

You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language

机译:您将获得注释的内容:瑞典语作为第二语言的带有教学法注释的课程教材集

获取原文
获取原文并翻译 | 示例

摘要

We present the COCTAILL corpus, containing over 700.000 tokens of Swedish texts from 12 coursebooks aimed at second/foreign language (L2) learning. Each text in the corpus is labelled with a proficiency level according to the CEFR proficiency scale. Genres, topics, associated activities, vocabulary lists and other types of information are annotated in the coursebooks to facilitate Second Language Acquisition (SLA)-aware studies and experiments aimed at Intelligent Computer-Assisted Language Learning (ICALL). Linguistic annotation in me form of parts-of-speech (POS; e.g. nouns, verbs), base forms (lemmas) and syntactic relations (e.g. subject, object) has been also added to the corpus. In the article we describe our annotation scheme and the editor we have developed for the content mark-up of the coursebooks, including the taxonomy of pedagogical activities and linguistic skills. Inter-annotator agreement has been computed and reported on a subset of the corpus. Surprisingly, we have not found any other examples of pedagogically marked-up corpora based on L2 coursebooks to draw on existing experiences. Hence, our work may be viewed as "groping in the darkness" and eventually a starting point for others. The paper also presents our first quantitative exploration of the corpus where we focus on textually and pedagogically annotated features of the coursebooks to exemplify what types of studies can be performed using the presented annotation scheme. We explore trends shown in use of topics and genres over proficiency levels and compare pedagogical focus of exercises across levels. The final section of the paper summarises the potential this corpus holds for research within SLA and various ICALL tasks.
机译:我们介绍了COCTAILL语料库,其中包含来自12册针对第二/外语(L2)学习的瑞典语文本的超过700.000令牌。语料库中的每个文本均根据CEFR熟练程度等级标有熟练程度。课本中注释了类型,主题,相关活动,词汇表和其他类型的信息,以促进针对第二语言习得(SLA)的研究和针对智能计算机辅助语言学习(ICALL)的实验。在我的语料库中还添加了词性(POS;例如名词,动词),基本形式(lemmas)和句法关系(例如主语,宾语)形式的语言注释。在本文中,我们描述了注释方案和为课程内容标记而开发的编辑器,包括教学活动和语言技能的分类法。注释者之间的协议已被计算并报告在语料库的子集上。令人惊讶的是,我们没有发现任何其他基于L2课本的教学标记语料库示例,可以借鉴现有经验。因此,我们的工作可能被视为“在黑暗中摸索”,并最终成为他人的起点。本文还介绍了我们对语料库的首次定量探索,我们重点研究了课本的文本和教学方法注释功能,以举例说明可以使用提出的注释方案进行哪些类型的研究。我们探索了在熟练程度水平上使用主题和体裁所显示的趋势,并比较了各个水平练习的教学重点。本文的最后一部分总结了该语料库在SLA和各种ICALL任务中的研究潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号