首页> 外文会议>Workshop on Innovative Use of NLP for Building Educational Applications >Native Language Identification: a Simple n-gram Based Approach
【24h】

Native Language Identification: a Simple n-gram Based Approach

机译:母语识别:一种简单的基于n-gram的方法

获取原文
获取外文期刊封面目录资料

摘要

This paper describes our approaches to Native Language Identification (NLI) for the NLI shared task 2013. NLI as a sub area of author profiling focuses on identifying the first language of an author given a text in his second language. Researchers have reported several sets of features that have achieved relatively good performance in this task. The type of features used in such works are: lexical, syntactic and stylistic features, dependency parsers, psycholinguistic features and grammatical errors. In our approaches, we selected lexical and syntactic features based on n-grams of characters, words, Penn TreeBank (PTB) and Universal Parts Of Speech (POS) tagsets, and perplexity values of character of n-grams to build four different models. We also combine all the four models using an ensemble based approach to get the final result. We evaluated our approach over a set of 11 native languages reaching 75% accuracy.
机译:本文介绍了我们对NLI共享任务的母语语言识别(NLI)的方法.NLI作为作者分析的子区域,专注于识别作者的第一语言给出了他的第二语言的文本。研究人员报告了几套功能在这项任务中取得了相对良好的表现。此类作品中使用的功能类型是:词汇,句法和风格特征,依赖解析器,心理语言学特征和语法错误。在我们的方法中,我们选择了基于N-GR克的字符,单词,Penn TreeBank(PTB)和语音(POS)标签的通用部分的词汇和句法功能,以及N-GRAM的字符的困惑值,以构建四种不同的模型。我们还使用基于集合的方法来结合所有的四种模型来获得最终结果。我们评估了一套11种母语的方法,达到了75%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号