首页> 美国卫生研究院文献>Computational Intelligence and Neuroscience >Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition
【2h】

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition

机译:在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Serbian is in a group of highly inflective and morphologically rich languages that use a lot of different word suffixes to express different grammatical, syntactic, or semantic features. This kind of behaviour usually produces a lot of recognition errors, especially in large vocabulary systems—even when, due to good acoustical matching, the correct lemma is predicted by the automatic speech recognition system, often a wrong word ending occurs, which is nevertheless counted as an error. This effect is larger for contexts not present in the language model training corpus. In this manuscript, an approach which takes into account different morphological categories of words for language modeling is examined, and the benefits in terms of word error rates and perplexities are presented. These categories include word type, word case, grammatical number, and gender, and they were all assigned to words in the system vocabulary, where applicable. These additional word features helped to produce significant improvements in relation to the baseline system, both for n-gram-based and neural network-based language models. The proposed system can help overcome a lot of tedious errors in a large vocabulary system, for example, for dictation, both for Serbian and for other languages with similar characteristics.
机译:塞尔维亚语属于一组高度屈折且形态丰富的语言,它们使用许多不同的单词后缀来表达不同的语法,句法或语义特征。这种行为通常会产生很多识别错误,尤其是在大型词汇系统中,即使由于良好的声学匹配而由自动语音识别系统预测出正确的引理时,通常也会出现错误的单词结尾,但仍然可以计算在内作为错误。对于语言模型训练语料库中不存在的上下文,此效果更大。在本手稿中,研究了一种在语言建模中考虑了词的不同形态类别的方法,并提出了在词错误率和困惑性方面的好处。这些类别包括单词类型,单词大小写,语法编号和性别,并且在适用的情况下,它们都被分配给系统词汇中的单词。这些附加的单词功能有助于相对于基于n-gram的语言模型和基于神经网络的语言模型,相对于基准系统产生重大改进。所提出的系统可以帮助克服大型词汇系统中的许多繁琐错误,例如,用于塞尔维亚语和具有类似特征的其他语言的听写。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号