首页> 外文会议>European Conference on Speech Communication and Technology - EUROSPEECH 2003(INTERSPEECH 2003) vol.3; 20030901-04; Geneva(CH) >Evaluation of the Stochastic Morphosyntactic Language Model on a One Million Word Hungarian Dictation Task
【24h】

Evaluation of the Stochastic Morphosyntactic Language Model on a One Million Word Hungarian Dictation Task

机译:一百万字匈牙利听写任务的随机形态语法语言模型的评估

获取原文
获取原文并翻译 | 示例

摘要

In this article we evaluate our stochastic morphosyntactic language model (SMLM) on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The proposed method is based on the use of morphemes as the basic recognition units and the combination of a morpheme N-gram model and a morphosyntactic language model. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. Thanks to the flexible transducer-based architecture, the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the N-gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.
机译:在本文中,我们根据匈牙利报纸的听写任务评估随机的形态句法语言模型(SMLM),该任务需要对超过一百万种不同的单词形式进行建模。所提出的方法是基于使用词素作为基本识别单元,并结合了词素N元语法模型和句法语言模型。识别系统的体系结构基于加权有限状态传感器(WFST)范例。得益于基于传感器的灵活架构,语态句法组件可以与基本模块无缝集成,而无需修改解码器本身。我们比较了两种配置中的音素,语素和单词错误率以及识别网络的大小。在一种配置中,我们仅使用N-gram模型,而在另一种配置中,我们使用组合模型。与基线三字母组合系统相比,所提出的随机语态句法语言模型将语素错误率降低了1.7%至7.2%。最佳配置的词素错误率是18%,最佳字词错误率是22.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号