From Language to Family and Back: Native Language and Language Family Identification from English Text

机译：从语言到家庭和背部：母语和语言系列识别英文文本

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Revealing an anonymous author's traits from text is a well-researched area. In this paper we aim to identify the native language and language family of a non-native English author, given his/her English writings. We extract features from the text based on prior work, and extend or modify it to construct different feature sets, and use support vector machines for classification. We show that native language identification accuracy can be improved by up to 6.43% for a 9-class task, depending on the feature set, by introducing a novel method to incorporate language family information. In addition we show that introducing grammar-based features improves accuracy of both native language and language family identification.

机译：揭示匿名作者的文本的特质是一个研究的良好区域。在本文中，鉴于他/她的英语作品，我们的目标是识别非母语英语作者的母语和语言家族。我们根据现有工作提取文本的功能，并扩展或修改它以构建不同的功能集，并使用支持向量机进行分类。对于9级任务，我们可以通过引入语言系列信息来提高9级任务的母语识别准确度高达6.43％。此外，我们表明，引入基于语法的特征可以提高母语和语言系列识别的准确性。

著录项

来源
《Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies》|2013年||共8页
会议地点
作者
Ariel Stolerman; Aylin Caliskan Islam; Rachel Greenstadt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. 发音案例分析——运用母语语言背景理解中国英语学习者发音问题(英文) [J] . 高莉莉中国应用语言学：英文版 . 2005,第002期
2. span style="font-size:11.0pt;font-family: "Times New Roman","serif";mso-fareast-font-family:"Times New Roman";mso-bidi-font-family: Mangal;mso-ansi-language:PL;mso-fareast-language:EN-US;mso-bidi-language:HI" lang="PL"Corrosion behaviour of span style="font-size:11.0pt;font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";mso-bidi-font-family:Mangal; mso-ansi-language:EN-GB;mso-fareast-language:EN-US;mso-bidi-language:HI" lang="EN-GB"Cusub24/subZnsub5/subAl alloy in a sodium tetraborate solution in the presence of 1-phenyl-5-mercaptotetrazole/span/span [J] . Antonijevic M M, Gardic V R, Gupta V K Indian journal of chemical technology . 2014,第5a6期

机译：style =“ font-size：11.0pt; font-family：” Times New Roman“，” serif“; mso-fareast-font-family：” Times New Roman“; mso-bidi-font-family：Mangal; mso-ansi-language：PL; mso-fareast-language：EN-US; mso-bidi-language：HI“ lang =” PL“> style =” font-size：11.0pt; font-family的腐蚀行为：“ Times New Roman”，“ serif”; mso-fareast-font-family：“ Times New Roman”; mso-bidi-font-family：Mangal; mso-ansi语言：EN-GB; mso-fareast-language ：EN-US; mso-bidi-language：HI“ lang =” EN-GB“> Cu _{24 Zn _{5 铝合金在四硼酸钠溶液中存在1-苯基-5-巯基四唑}}
3. The Effects of English Language Proficiency and Scientific Reasoning Skills on the Acquisition of Science Content Knowledge by Hispanic English Language Learners and Native English Language Speaking Students [J] . Hector N. Torres, Dana L. Zeidler Electronic Journal of Science Education . 2002,第3期

机译：英语水平和科学推理能力对西班牙裔英语学习者和说英语的学生获得科学内容知识的影响
4. Language use in a 'one parent-one language' Mandarin-English bilingual family: noun versus verb use and language mixing compared to maternal perception [J] . Qiu Chen, Winsler Adam International Journal of Bilingual Education and Bilingualism . 2017,第3期

机译：“一对一母语”普通话-英语双语家庭中的语言使用：名词与动词的使用以及语言混合与母性感知相比
5. From Language to Family and Back: Native Language and Language Family Identification from English Text [C] . Ariel Stolerman, Aylin Caliskan Islam, Rachel Greenstadt Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies . 2013

机译：从语言到家庭再到后方：英语文本中的母语和语言家庭识别
6. The Bilingual Learner's Journey into Englis H: How Families Preserve the Native Language and Cultural Identity of English Language Learners [D] . Carrillo, Ruben 2019

机译：双语学习者进入Englis H的旅程：家庭如何保留英语学习者的母语和文化身份
7. The Impact of Orthography on Text Production in Three Languages: Catalan English and Spanish [O] . Anna Llaurado, Julie E. Dockrell 2020

机译：正交图对三种语言的文本生产的影响：加泰罗尼亚语英语和西班牙语
8. Text Language Identification Using Letters (Frequency, Self-information, and Entropy) Analysis for English, French, and German Languages [O] . Rasha Hassan Abbas, Firas Abdul Elah Abdul Kareem 2019

机译：文本语言识别使用字母（频率，自我信息和熵）分析英语，法语和德语语言

From Language to Family and Back: Native Language and Language Family Identification from English Text

摘要

著录项

相似文献

相关主题

期刊订阅