首页> 外文会议>2012 12th UK Workshop on Computational Intelligence. >Feature selection for detecting language levels in L2 English Learners
【24h】

Feature selection for detecting language levels in L2 English Learners

机译:用于选择L2英语学习者中的语言水平的功能选择

获取原文
获取原文并翻译 | 示例

摘要

This study analyses different features that would enable classifiers to detect language levels in adult second language (L2) English Learners. 46 different speech samples from users speaking 15 different L1 native languages were selected from the Learning Prosody in a Foreign Language (LeaP) corpus [1]. Using different groupings of features from the spoken L2 secondary language (English), a Support Vector Machine (SVM), was trained and the speakers were classified into three different categories: c1, c2, and s1. These categories used correspond to beginner, intermediate, and advanced levels of the target secondary language, English. The categories in the automatic system correspond to the same category names given by the human annotators of the LeaP corpus. The features are grouped into four different sub-categories: sentence, syllable, duration, and pitch. Count features, such as sentence word count, sentence article count, etc. had the greatest influence on the system, while the sentence features had the second most influence. Surprisingly, most of the pitch features had no effect on the accuracy. A small common word list was also used, that proved to be very helpful. The edit distance measure of the sentences with the common words removed had a positive effect; measurable differences could be found with and without the common words included in the sentences. Due to the small size of the training and testing sets, it was found that the different groupings of the L1 languages of the speakers had a significant effect on the accuracy of the classification predictions. Certain combinations of L1 training and test sets had a higher accuracy rating depending on the L1 languages used in training or test. The classification predictions had a variance as much as 40%.
机译:这项研究分析了使分类器能够检测成人第二语言(L2)英语学习者的语言水平的不同功能。从学习外语(LeaP)语料库中选择了使用15种不同的L1母语的用户的46种不同的语音样本[1]。使用来自第二语言的辅助语言(英语)的不同功能分组,对支持向量机(SVM)进行了培训,并将说话者分为三类:c1,c2和s1。使用的这些类别对应于目标辅助语言(英语)的初级,中级和高级级别。自动系统中的类别对应于LeaP语料库的人工注释者给出的相同类别名称。这些功能分为四个不同的子类别:句子,音节,时长和音调。计数特征,如句子字数,句子文章数等,对系统的影响最大,而句子特征的影响次之。令人惊讶的是,大多数音高特征都对精度没有影响。还使用了一个小的常用单词列表,这非常有用。去除常用词的句子的编辑距离测度有积极作用;无论句子中是否包含常用词,都可以发现可测量的差异。由于训练和测试集的大小较小,因此发现说话者的L1语言的不同分组对分类预测的准确性有重大影响。 L1训练和测试集的某些组合具有较高的准确度等级,具体取决于训练或测试中使用的L1语言。分类预测的差异最大为40%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号