首页> 外文会议>Joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and literature >Exploring word embeddings and phonological similarity for the unsupervised correction of language learner errors
【24h】

Exploring word embeddings and phonological similarity for the unsupervised correction of language learner errors

机译:探索语言学习者错误的无监督校正词嵌入和语音相似性

获取原文

摘要

The presence of misspellings and other errors or non-standard word forms poses a considerable challenge for NLP systems. Although several supervised approaches have been proposed previously to normalize these, annotated training data is scarce for many languages. We investigate, therefore, an unsupervised method where correction candidates for Swedish language learners' errors are retrieved from word embeddings. Furthermore, we compare the usefulness of combining cosine similarity with orthographic and phonological similarity based on a neural grapheme-to-phoneme conversion system we train for this purpose. Although combinations of similarity measures have been explored for finding correction candidates, it remains unclear how these measures relate to each other and how much they contribute individually to identifying the correct alternative. We experiment with different combinations of these and find that integrating phonological information is especially useful when the majority of learner errors are related to misspellings, but less so when errors are of a variety of types including, e.g. grammatical errors.
机译:拼写错误的存在和其他错误或非标准单词表单对NLP系统构成了相当大的挑战。虽然先前提出了几种监督方法,以便正常化这些,但是稀释的培训数据对于许多语言来说是稀缺的。因此,我们调查了无监督的方法,其中瑞典语语言学习者错误的纠正候选者从Word Embeddings检索。此外,我们基于针对此目的的神经图形到音素转换系统比较与正交和音韵相似性相结合的有用性。虽然已经探索了相似性措施的组合来寻找纠正候选者,但仍然尚不清楚这些措施如何彼此相关,以及它们单独贡献以识别正确的替代方案。我们尝试这些不同的组合,并发现当大多数学习者错误与拼写错误相关时,整合语音信息尤其有用,但当错误的误差包括多种类型时,包括例如,包括例如,语法错误。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号