【24h】

Feature Analysis for Native Language Identification

机译:母语识别的特征分析

获取原文

摘要

In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database - EFCAMDAT [10] developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.
机译:在这项研究中,我们调查不同特征对母语识别任务的作用。为此目的,我们根据EF Cambridge Open Language Database - EFCAMDAT [10]在剑桥大学和EF教育的合作中编制了一个学习者语料库。我们考虑的功能包括字符n-grams,位置令牌频率,语音n-gram的一部分,功能字,shell名词和一组注释错误。最后但并非最不重要的是,我们仔细研究了与他们的原籍国共享相同母语的英语学习者的论文是否可以区分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号