首页> 外文会议>Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies >From Language to Family and Back: Native Language and Language Family Identification from English Text
【24h】

From Language to Family and Back: Native Language and Language Family Identification from English Text

机译:从语言到家庭和背部:母语和语言系列识别英文文本

获取原文

摘要

Revealing an anonymous author's traits from text is a well-researched area. In this paper we aim to identify the native language and language family of a non-native English author, given his/her English writings. We extract features from the text based on prior work, and extend or modify it to construct different feature sets, and use support vector machines for classification. We show that native language identification accuracy can be improved by up to 6.43% for a 9-class task, depending on the feature set, by introducing a novel method to incorporate language family information. In addition we show that introducing grammar-based features improves accuracy of both native language and language family identification.
机译:揭示匿名作者的文本的特质是一个研究的良好区域。在本文中,鉴于他/她的英语作品,我们的目标是识别非母语英语作者的母语和语言家族。我们根据现有工作提取文本的功能,并扩展或修改它以构建不同的功能集,并使用支持向量机进行分类。对于9级任务,我们可以通过引入语言系列信息来提高9级任务的母语识别准确度高达6.43%。此外,我们表明,引入基于语法的特征可以提高母语和语言系列识别的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号