【24h】

Feature Analysis for Native Language Identification

机译:母语识别的特征分析

获取原文

摘要

In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database - EFCAMDAT developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.
机译:在这项研究中,我们调查了不同功能在母语识别任务中的作用。为此,我们根据与EF Education合作在剑桥大学开发的EF剑桥开放语言数据库-EFCAMDAT的子集,编写了一个学习者语料库。我们要考虑的特征包括字符n-语法,位置标记频率,词性n-语法,功能词,外壳名词和一组带注释的错误。最后但并非最不重要的一点是,我们研究了具有相同母语的英语学习者的论文是否可以根据其原籍国加以区分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号