首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition
【24h】

Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition

机译:基于加权有限状态传感器的语音识别结构分类方法

获取原文
获取原文并翻译 | 示例

摘要

The potential of structural classification methods for automatic speech recognition (ASR) has been attracting the speech community since they can realize the unified modeling of acoustic and linguistic aspects of recognizers. However, the structural classification approaches involve well-known tradeoffs between the richness of features and the computational efficiency of decoders. If we are to employ, for example, a frame-synchronous one-pass decoding technique, features considered to calculate the likelihood of each hypothesis must be restricted to the same form as the conventional acoustic and language models. This paper tackles this limitation directly by exploiting the structure of the weighted finite-state transducers (WFSTs) used for decoding. Although WFST arcs provide rich contextual information, close integration with a computationally efficient decoding technique is still possible since most decoding techniques only require that their likelihood functions are factorizable for each decoder arc and time frame. In this paper, we compare two methods for structural classification with the WFST-based features; the structured perceptron and conditional random field (CRF) techniques. To analyze the advantages of these two classifiers, we present experimental results for the TIMIT continuous phoneme recognition task, the WSJ transcription task, and the MIT lecture transcription task. We confirmed that the proposed approach improved the ASR performance without sacrificing the computational efficiency of the decoders, even though the baseline systems are already trained with discriminative training techniques (e.g., MPE).
机译:自动语音识别(ASR)的结构分类方法的潜力吸引了语音界,因为它们可以实现识别器的声学和语言方面的统一建模。然而,结构分类方法涉及特征丰富度与解码器的计算效率之间的众所周知的折衷。例如,如果我们要采用帧同步单程解码技术,则必须考虑为计算每种假设的可能性而考虑的特征必须与常规声学和语言模型的形式相同。本文通过利用用于解码的加权有限状态换能器(WFST)的结构直接解决了这一限制。尽管WFST弧提供了丰富的上下文信息,但由于大多数解码技术仅要求针对每个解码器弧和时间帧可分解其似然函数,因此仍可能与计算有效的解码技术紧密集成。在本文中,我们比较了两种基于WFST的结构分类方法。结构化感知器和条件随机场(CRF)技术。为了分析这两个分类器的优势,我们提供了TIMIT连续音素识别任务,WSJ转录任务和MIT讲座转录任务的实验结果。我们确认,即使基线系统已经使用判别式训练技术(例如MPE)进行训练,所提出的方法仍在不牺牲解码器计算效率的情况下提高了ASR性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号