首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP 2009 >Voiced/unvoiced pattern-based duration modeling for language identification
【24h】

Voiced/unvoiced pattern-based duration modeling for language identification

机译:基于浊音/清音模式的持续时间建模,用于语言识别

获取原文

摘要

Most existing duration modeling approaches facilitates phone recognizer and require manually annotated corpus to train the segmentation models, which is usually cost- and time-consuming. In this paper, a novel duration modeling approach is proposed, which does not require phone recognizer/annotated training data, and facilitates fast computation of language identification. In this approach, the segmentation is implemented by using articulatory features like voicing status. A pair of connected unvoiced and voiced segments is considered as the unit, and the duration of each segment is normalized for each utterance and then quantized into 20 discrete ranges. The ranges of units are later considered as symbol sequences and are modeled by n-gram models, to capture the temporal pattern, which is hypothesized to vary in different languages. The experiments based on the NIST LRE 2005 tasks show a relative 19.7% EER improvement by introducing the proposed duration modeling-based system into a fusion system containing two GMM-UBM based acoustic systems using MFCC and pitch+intensity features.
机译:大多数现有的持续时间建模方法有助于电话识别器,并且需要手动注释的语料库来训练分段模型,这通常是费时且费时的。在本文中,提出了一种新颖的持续时间建模方法,该方法不需要电话识别器/带注释的训练数据,并有助于快速进行语言识别的计算。在这种方法中,通过使用诸如语音状态之类的发音特征来实现分割。一对连接的清音和浊音片段被视为单位,每个片段的持续时间针对每种话语进行标准化,然后量化为20个离散范围。单位范围后来被认为是符号序列,并通过n元语法模型进行建模,以捕获时间模式,该时间模式被假定为在不同语言中有所不同。通过将建议的基于持续时间建模的系统引入包含两个使用MFCC和音高+强度功能的基于GMM-UBM的声学系统的融合系统,基于NIST LRE 2005任务的实验显示出相对EER的提高了19.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号