【24h】

Towards error annotation in a learner corpus of Portuguese

机译:走向葡萄牙语学习者语料库中的错误注释

获取原文
获取原文并翻译 | 示例

摘要

In this article, we present COPLE2, a new corpus of Portuguese that encompasses written and spoken data produced by foreign learners of Portuguese as a foreign or second language (FL/L2). Following the trend towards learner corpus research applied to less commonly taught languages, it is our aim to enhance the learning data of Portuguese L2. These data may be useful not only for educational purposes (design of learning materials, curricula, etc.) but also for the development of NLP tools to support students in their learning process. The corpus is available online using TEITOK environment, a web-based framework for corpus treatment that provides several built-in NLP tools and a rich set of functionalities (multiple orthographic transcription layers, lemmatization and POS, normalization of the tokens, error annotation) to automatically process and annotate texts in xml format. A CQP-based search interface allows searching the corpus for different fields, such as words, lemmas, POS tags or error tags. We will describe the work in progress regarding the constitution and linguistic annotation of this corpus, particularly focusing on error annotation.
机译:在本文中,我们介绍COPLE2,这是葡萄牙语的一个新语料库,其中包含外国葡萄牙语学习者作为外语或第二语言(FL / L2)编写的书面和口头数据。随着学习者语料库研究逐渐应用于较少教授的语言的趋势,我们的目标是增强葡萄牙语L2的学习数据。这些数据不仅可用于教育目的(设计学习材料,课程等),还可用于开发NLP工具以支持学生的学习过程。该语料库可使用TEITOK环境在线获得,TEITOK环境是一个基于Web的语料库处理框架,它提供了多个内置的NLP工具和丰富的功能集(多个正交拼写层,lemmatization和POS,令牌的归一化,错误注释),可用于自动处理和注释xml格式的文本。基于CQP的搜索界面允许在语料库中搜索不同的字段,例如单词,词条,POS标签或错误标签。我们将描述有关该语料库的构成和语言注释的工作,特别是错误注释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号