Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification

机译：失落与被发现的母语：母语识别中的资源和实证评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we present work on the task of Native Language Identification (NLI). We present an alternative corpus to the ICLE which has been used in most work up until now. We believe that our corpus, TOEFL11, is more suitable for the task of NLI and will allow researchers to better compare systems and results. We show that many of the features that have been commonly used in this task generalize to new and larger corpora. In addition, we examine possible ways of increasing current system performance (e.g., additional features and feature combination methods), and achieve overall state-of-the-art results (accuracy of 90.1%) on the ICLE corpus using an ensemble classifier that includes previously examined features and a novel feature (n-gram language models). We also show that training on a large corpus and testing on a smaller one works well, but not vice versa. Finally, we show that system performance varies across proficiency scores.

机译：在本文中，我们介绍了本地语言识别（NLI）任务。我们提供了ICLE的替代语料库，到目前为止，在大多数工作中都使用了该语料库。我们相信我们的语料库TOEFL11更适合NLI的任务，并将使研究人员可以更好地比较系统和结果。我们表明，此任务中常用的许多功能可以推广到新的和较大的语料库。此外，我们研究了提高当前系统性能的可能方法（例如，附加功能和功能组合方法），并使用集成分类器在ICLE语料库上实现总体最新水平的结果（准确性为90.1％）先前检查过的功能和一个新颖的功能（n-gram语言模型）。我们还表明，在大型语料库上进行培训而在较小的语料库上进行测试会很好，但反之则不然。最后，我们证明了系统性能随熟练程度得分的不同而不同。

著录项

来源
《International conference on computational linguistics》|2012年|2585-2601|共17页
会议地点
作者
Joel TETREAULT; Daniel BLANCHARD; Aoife CAHILL; Martin CHODOROW;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Native Language Identification; Text Classification; Corpora;

机译：母语识别;文字分类;语料库;

相似文献

外文文献
中文文献
专利

1. Native American Languages As Heritage Mother Tongues [J] . Teresa L, McCarty Language, culture and curriculum . 2008,第3期

机译：美洲原住民语言作为传统母语
2. Native Language Identification of Fluent and Advanced Non-Native Writers [J] . Sarwar Raheem, Rutherford Attapol T., Hassan Saeed-Ul, ACM transactions on Asian and low-resource language information processing . 2020,第4期

机译：流利和先进的非本土作家的母语识别
3. Vowel identification in temporal-modulated noise for native and non-native listeners: Effect of language experience [J] . Guan Jingjing, Liu Chang, Tao Sha, The Journal of the Acoustical Society of America . 2015,第3aPta1期

机译：本地和非本地听众在时间调制噪声中的元音识别：语言体验的影响
4. Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification [C] . Joel TETREAULT, Daniel BLANCHARD, Aoife CAHILL, International conference on computational linguistics . 2012

机译：母语，丢失和发现：母语识别的资源和经验评估
5. Native Language Identification Using Phonetic Algorithms [D] . Smiley, Charese H. 2018

机译：使用拼音算法的母语语言识别
6. Native Language Influence on Brass Instrument Performance: An Application of Generalized Additive Mixed Models (GAMMs) to Midsagittal Ultrasound Images of the Tongue [O] . Matthias Heyne, Donald Derrick, Jalal Al-Tamimi 2019

机译：母语对黄铜仪器性能的影响：将广义添加剂混合模型（GAMMS）应用于舌头的中间超声图像
7. A Bibliography of Algonquian Syllabic Texts in Canadian Repositories, by John Murdoch; Books in Native Languages in the Rare Book Collections of the National Library of Canada / Livres en langues autochtones dans les collections de livres rares de la Bibliotheque nationale du Canada, comp. by Joyce M. Banks; Masinahikan: Native Language Imprints in the Archives and Libraries of the Anglican Church of Canada, comp. by Karen Evans; Resources for Native Peoples Studies, by Nora T. Corley [O] . Barry Edwards 1985

机译：在加拿大房地提提中的algonquian音节文本的书目，由John Murdoch; 在加拿大国家图书馆的罕见书籍中的母语书籍/ livres en an ansa antachtones dans les les collections de livres rares de la bibliotheque Nationale du Canada ，Comp。作者：Joyce M.银行; masinahikan：加拿大英国英国教教堂的档案和图书馆中的母语印记，comp。由凯伦埃文斯;诺拉T. Corley的祖国研究资源

Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification

摘要

著录项

相似文献

相关主题

期刊订阅