Native Language Identification: a Simple n-gram Based Approach

机译：母语识别：一种简单的基于n-gram的方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes our approaches to Native Language Identification (NLI) for the NLI shared task 2013. NLI as a sub area of author profiling focuses on identifying the first language of an author given a text in his second language. Researchers have reported several sets of features that have achieved relatively good performance in this task. The type of features used in such works are: lexical, syntactic and stylistic features, dependency parsers, psycholinguistic features and grammatical errors. In our approaches, we selected lexical and syntactic features based on n-grams of characters, words, Penn TreeBank (PTB) and Universal Parts Of Speech (POS) tagsets, and perplexity values of character of n-grams to build four different models. We also combine all the four models using an ensemble based approach to get the final result. We evaluated our approach over a set of 11 native languages reaching 75% accuracy.

机译：本文介绍了我们对NLI共享任务的母语语言识别（NLI）的方法.NLI作为作者分析的子区域，专注于识别作者的第一语言给出了他的第二语言的文本。研究人员报告了几套功能在这项任务中取得了相对良好的表现。此类作品中使用的功能类型是：词汇，句法和风格特征，依赖解析器，心理语言学特征和语法错误。在我们的方法中，我们选择了基于N-GR克的字符，单词，Penn TreeBank（PTB）和语音（POS）标签的通用部分的词汇和句法功能，以及N-GRAM的字符的困惑值，以构建四种不同的模型。我们还使用基于集合的方法来结合所有的四种模型来获得最终结果。我们评估了一套11种母语的方法，达到了75％的准确性。

著录项

来源
《Workshop on Innovative Use of NLP for Building Educational Applications》|2013年||共8页
会议地点
作者
Binod Gyawali; Gabriela Ramirez; Thamar Solorio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties [J] . Castro Dayvid W., Souza Ellen, Vitorio Douglas, Applied Soft Computing . 2017,第期

机译：用于推文语言识别的平滑N-GRAM模型：巴西和欧洲葡萄牙民族品种的案例研究
2. Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets [J] . Dario Borrelli, Gabriela Gongora Svartzman, Carlo Lipizzi PLoS One . 2020,第6期

机译：无监督的象征自然语言惯用单位的收购：基于N-GRAM频率的新闻文章和推文的方法
3. A Domain-Based Approach to Extract Arabic Person Names Using N-Grams and Simple Rules [J] . Mohammad Alhawarat Asian Journal of Information Technology . 2015,第8期

机译：基于域的使用N语法和简单规则提取阿拉伯人姓名的方法
4. Native Language Identification: a Simple n-gram Based Approach [C] . Binod Gyawali, Gabriela Ramirez, Thamar Solorio Workshop on Innovative Use of NLP for Building Educational Applications . 2013

机译：母语识别：基于n-gram的简单方法
5. Learning Chinese characters: A comparative study of the learning strategies of students whose native language is alphabet-based and students whose native language is character-based. [D] . Arrow, Ju-Chuan. 2004

机译：学习汉字：对以字母为母语的学生和以字符为母语的学生的学习策略的比较研究。
6. Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets [O] . Dario Borrelli, Gabriela Gongora Svartzman, Carlo Lipizzi 2020

机译：无监督的象征自然语言惯用单位的收购：新闻文章和推文的分组的基于n克频率的方法
7. The Power of Character N-grams in Native Language Identification [O] . Artur Kulmizev, Bo Blankers, Johannes Bjerva, 2017

机译：母语识别中的字符n-gram的力量
8. Investigation of Back-off Based Interpolation Between Recurrent Neural Network and N-gram Language Models (Author's Manuscript). [R] . Chen, X., Liu, X., Gales, M. J. F., 2016

机译：基于回退的递归神经网络与N-gram语言模型的插值研究（作者手稿）。

Native Language Identification: a Simple n-gram Based Approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅