首页> 外文会议>IEEE Workshop on Automatic Speech Recognition and Understanding >Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions
【24h】

Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

机译:无监督的HMM后品图对于零资源条件中的语言独立声学建模

获取原文

摘要

The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) - Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25% for within speaker pairs and 40% for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations.
机译:语言独立声学单元建模在未标记的原始语音(零资源设置)上的任务在近年来上获得了重大兴趣。这里的主要挑战是提取声学表示,这些声述引起不同扬声器所说的同一词语或语言令牌之间的良好相似性,并以语言独立方式导出这些表示。在本文中,我们探讨了基于隐马尔可夫模型(HMM)的无监督声学单元建模的后绪语。使用高斯混合模型(GMM) - 通用背景模型(UBM)初始化HMM的状态(其代表语言独立声学单元)。随后使用训练的HMM来生成时间上连续的状态对准,然后在混合深神经网络(DNN)模型中进行建模。出于测试的目的,我们使用从DNN获得的帧级HMM状态后索作为Zerospeech Chalrenge任务的功能。对于扬声器对,测量最小的对ABX错误率。随着多国语言的几个实验中ZeroSpeech语料库,我们表明,所提出的基于HMM后的功能比使用MFCC特征(平均相对的25 %的扬声器对中跨发音人对改进和40 %的基线系统提供显著改善)。此外,未看过目标语言的实验说明所提出的建模方法能够学习全球语言独立表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号