首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition
【24h】

Learning a Discriminative Weighted Finite-State Transducer for Speech Recognition

机译:学习用于语音识别的判别加权有限状态传感器

获取原文
获取原文并翻译 | 示例

摘要

Weighted finite-state transducers (WFSTs) have been widely adopted as efficient representations of a general speech recognition model. The WFST for speech recognizer is typically assembled or composed from the several components-the language model, the pronunciation mapping and the acoustic model-which are estimated separately without any end-to-end optimization. This paper examines how the weights of such transducers can be learned in a manner that captures the interaction between the components. The paths in the transducer are represented as n -grams defined over the input and output sequences whose linear weights are learned using a discriminative criterion. The resulting linear model factors into two weighted finite-state acceptors (WFSAs) which can be applied as corrections to the input and the output side of the initial WFST. This formulation allows duration cues to be incorporated seamlessly. Empirical results on a large vocabulary Arabic GALE task demonstrate that the proposed model improves word error rate substantially, with a gain of 1.5%-1.7% absolute. Through a series of experiments, we analyze the contributions from and interactions between acoustic, duration, and language components to find that duration cues play an important role in a large-vocabulary Arabic speech recognition task. Although this paper focuses on speech recognition, the proposed framework for learning the weights of a finite transducer is more general in nature and can be applied to other tasks such as utterance classification.
机译:加权有限状态换能器(WFST)已被广泛用作通用语音识别模型的有效表示。语音识别器的WFST通常是由几个组件(语言模型,语音映射和声学模型)组装或组成的,它们无需任何端到端的优化就可以分别估算。本文探讨了如何通过捕获组件之间的相互作用的方式来学习此类传感器的重量。换能器中的路径表示为在输入和输出序列上定义的 n 克,它们的线性权重是使用判别标准来学习的。所得的线性模型分解成两个加权的有限状态受体(WFSA),可以将其用作对初始WFST的输入和输出端的校正。这种表述允许持续时间提示被无缝地结合。对大词汇量阿拉伯语GALE任务的经验结果表明,所提出的模型显着提高了单词错误率,绝对增益为1.5%-1.7%。通过一系列实验,我们分析了声音,持续时间和语言成分的贡献以及交互作用,发现持续时间线索在阿拉伯语大词汇量语音识别任务中起着重要作用。尽管本文着重于语音识别,但所提出的用于学习有限换能器权重的框架本质上更为笼统,可以应用于其他任务,例如话音分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号