首页> 外文会议>International Conference on Text, Speech and Dialogue >Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English
【24h】

Incremental Dependency Parsing and Disfluency Detection in Spoken Learner English

机译:英语口语中的增量依赖性解析与失控检测

获取原文

摘要

This paper investigates the suitability of state-of-the-art natural language processing (NLP) tools for parsing the spoken language of second language learners of English. The task of parsing spoken learner-language is important to the domains of automated language assessment (ALA) and computer-assisted language learning (CALL). Due to the non-canonical nature of spoken language (containing filled pauses, nonstandard grammatical variations, hesitations and other disfluencies) and compounded by a lack of available training data, spoken language parsing has been a challenge for standard NLP tools. Recently the Redshift parser (Honnibal et al. In: Proceedings of CoNLL (2013)) has been shown to be successful in identifying grammatical relations and certain disfluencies in native speaker spoken language, returning unlabelled dependency accuracy of 90.5% and a disfluency F-measure of 84.1% (Honnibal & Johnson: TACL 2, 131-142 (2014)). We investigate how this parser handles spoken data from learners of English at various proficiency levels. Firstly, we find that Redshift's parsing accuracy on non-native speech data is comparable to Honnibal & Johnson's results, with 91.1% of dependency relations correctly identified. However, disfluency detection is markedly down, with an F-measure of just 47.8%. We attempt to explain why this should be, and investigate the effect of proficiency level on parsing accuracy. We relate our findings to the use of NLP technology for CALL and ALA applications.
机译:本文研究的国家的最先进的自然语言处理(NLP)工具的适用性分析的英语第二语言学习者的口语。解析口语学习者语言的任务是自动化的语言评估(ALA)和计算机辅助语言学习(CALL)的领域重要。由于口语(含填充停顿,非标准语法上的变化,犹豫等不流利)和缺乏可用的训练数据的加剧,口语解析一直是标准的NLP工具挑战的非经典性质。最近的红移解析器(Honnibal等In:CoNLL(2013)的论文)已被证明是成功的识别语法关系和在母语口语某些不流利,返回的90.5%未标记的依赖性准确性和不流利F值的84.1%(Honnibal生:TACL 2,131-142(2014))。我们正在调查这个解析器程序如何处理来自不同英语水平的英语学习者口语数据。首先,我们发现,在非母语语音数据红移的解析精度媲美Honnibal生的结果,与依赖关系的91.1%正确识别。然而,不流利检测是显着下降,具有只是47.8%的F值。我们试图解释为什么会这样,和调查能力水平上解析精度的影响。我们与我们的研究结果的使用自然语言处理技术进行CALL和ALA应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号