首页> 外文会议>Workshop on Innovative Use of NLP for Building Educational Applications >Maximizing Classification Accuracy in Native Language Identification
【24h】

Maximizing Classification Accuracy in Native Language Identification

机译:最大化母语识别中的分类准确性

获取原文

摘要

This paper reports our contribution to the 2013 NLI Shared Task. The purpose of the task was to train a machine-learning system to identify the native-language affiliations of 1,100 texts written in English by nonnative speakers as part of a high-stakes test of general academic English proficiency. We trained our system on the new TOEFL 11 corpus, which includes 11,000 essays written by nonnative speakers from 11 native-language backgrounds. Our final system used an SVM classifier with over 400,000 unique features consisting of lexical and POS n-grams occurring in at least two texts in the training set. Our system identified the correct native-language affiliations of 83.6% of the texts in the test set. This was the highest classification accuracy achieved in the 2013 NLI Shared Task.
机译:本文向2013年NLI共享任务报告了我们对2013年的贡献。该任务的目的是培训机器学习系统,以识别非营利扬声器用英语编写的1,100个文本的本土语言隶属关系,作为一般学术英语水平的高赌注测试的一部分。我们在新托福11个语料库上培训了我们的系统,其中包括来自11个母语背景的非扬声器编写的11,000名论文。我们的最终系统使用了SVM分类器,其中包含超过400,000个独特功能,包括在训练集中至少有两个文本中发生的词汇和POS N-GRAM。我们的系统确定了测试集中的最正确的本地语言隶属度为83.6%的文本。这是2013年NLI共享任务所取得的最高分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号