首页> 外文期刊>International journal of computational linguistics and applications >A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
【24h】

A Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity

机译:可读性强:基于语言复杂性的语言学习材料自动评估

获取原文
获取原文并翻译 | 示例
       

摘要

Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by presenting the first approach for predicting linguistic complexity for Swedish second language learning material on a 5-point scale. After showing that the traditional Swedish readability measure, Laesbarhetsindex (LIX), is not suitable for this task we propose a supervised machine learning model, based on a range of linguistic features, that can reliably classify texts according to their difficulty level. Our model obtained an accuracy of 81.3% and an F-score of 0.8, which is comparable to the state of the art in English and is considerably higher than previously reported results for other languages. We further studied the utility of our features with single sentences instead of full texts since sentences are a common linguistic unit in language learning exercises. We trained a separate model on sentence-level data with five classes, which yielded 63.4% accuracy. Although this is lower than the document level performance, we achieved an adjacent accuracy of 92%. Furthermore, we found that using a combination of different features, compared to using lexical features alone, resulted in 7% improvement in classification accuracy at the sentence level, whereas at the document level, lexical features were more dominant. Our models are intended for use in a freely accessible web-based language learning platform for the automatic generation of exercises.
机译:如果我们能够评估语料库和网络文本在给定水平上的学习者是否适合他们的语言,则它们可以成为丰富的语言学习资源。在本文中,我们旨在通过提出第一种方法来预测瑞典第二语言学习材料的语言复杂性(5分制)来解决这个问题。在证明传统的瑞典可读性度量Laesbarhetsindex(LIX)不适合此任务之后,我们基于一系列语言功能,提出了一种可监督的机器学习模型,该模型可以根据文本的难易程度对文本进行可靠的分类。我们的模型获得了81.3%的准确度和0.8的F分数,这与英语的最新水平相当,并且远远高于先前报告的其他语言的结果。由于句子是语言学习练习中常见的语言单元,因此我们进一步研究了使用单句而不是全文的功能的实用性。我们针对句子级别的数据使用五个类别训练了一个单独的模型,其准确性为63.4%。尽管这低于文档级别的性能,但我们实现了92%的相邻精度。此外,我们发现,与单独使用词汇功能相比,使用不同功能的组合可以使句子级别的分类准确性提高7%,而在文档级别,词汇功能更占优势。我们的模型旨在用于可免费访问的基于Web的语言学习平台中,以自动生成练习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号