首页> 外文会议>Human language technologies : The baltic perspective >Noisy-Channel Spelling Correction Models for Estonian Learner Language Corpus Lemmatisation
【24h】

Noisy-Channel Spelling Correction Models for Estonian Learner Language Corpus Lemmatisation

机译:爱沙尼亚语学习者语料库词法化的噪声通道拼写校正模型

获取原文
获取原文并翻译 | 示例

摘要

Morphological analysis is an important task in Estonian learner language studies that gives information about the words and forms used by the learners. Because of the spelling errors frequently occurring in language learner texts, these texts should undergo some error correction step before applying the conventional morphological analysis tools because the morphological analyser fails to find the correct analysis for the misspelled words. In this paper we compare several different spelling correction models with the aim of improving the lemmatisation accuracy of learner language texts. Experiments show that the simplest non-word noisy-channel spelling correction model with a disambiguation model applied on top of the morphological analyser output performs the best while some of the more complicated models even fail to beat the baseline that does not include any spelling correction.
机译:形态分析是爱沙尼亚学习者语言研究中的一项重要任务,它提供有关学习者使用的单词和形式的信息。由于语言学习者文本中经常出现拼写错误,因此在应用常规形态分析工具之前,这些文本应经过一些错误纠正步骤,因为形态分析器无法为拼写错误的单词找到正确的分析方法。在本文中,我们比较了几种不同的拼写纠正模型,目的是提高学习者语言文本的词素化精度。实验表明,最简单的非词类噪声通道拼写校正模型在形态分析器输出的顶部应用了歧义消除模型,效果最佳,而某些更复杂的模型甚至无法击败不包含任何拼写校正的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号