Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

机译：无监督的HMM后品图对于零资源条件中的语言独立声学建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) - Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25% for within speaker pairs and 40% for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations.

机译：语言独立声学单元建模在未标记的原始语音（零资源设置）上的任务在近年来上获得了重大兴趣。这里的主要挑战是提取声学表示，这些声述引起不同扬声器所说的同一词语或语言令牌之间的良好相似性，并以语言独立方式导出这些表示。在本文中，我们探讨了基于隐马尔可夫模型（HMM）的无监督声学单元建模的后绪语。使用高斯混合模型（GMM） - 通用背景模型（UBM）初始化HMM的状态（其代表语言独立声学单元）。随后使用训练的HMM来生成时间上连续的状态对准，然后在混合深神经网络（DNN）模型中进行建模。出于测试的目的，我们使用从DNN获得的帧级HMM状态后索作为Zerospeech Chalrenge任务的功能。对于扬声器对，测量最小的对ABX错误率。随着多国语言的几个实验中ZeroSpeech语料库，我们表明，所提出的基于HMM后的功能比使用MFCC特征（平均相对的25 ％的扬声器对中跨发音人对改进和40 ％的基线系统提供显著改善）。此外，未看过目标语言的实验说明所提出的建模方法能够学习全球语言独立表示。

著录项

来源
《IEEE Workshop on Automatic Speech Recognition and Understanding》|2017年|768p|共7页
会议地点
作者
T K Ansari; Rajath Kumar; Sonali Singh; Sriram Ganapathy; Susheela Devi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Hidden Markov models; Speech; Training; Mel frequency cepstral coefficient; Task analysis; Entropy;

机译：隐藏的马尔可夫模型;语音;训练;麦倍频跳跃系数;任务分析;熵;

相似文献

外文文献
中文文献
专利

1. Multilingual and unsupervised subword modeling for zero-resource languages [J] . Enno Hermann, Herman Kamper, Sharon Goldwater Computer speech and language . 2021,第Jana期

机译：零资源语言的多语言和无人监督子字建模
2. Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics Using Multiple Acoustic Models Under Noisy Environment [J] . Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, 電子情報通信学会技術研究報告. 音声. Speech . 2004,第542期

机译：噪声环境下基于HMM充分统计的多种声学模型的无监督说话人自适应
3. Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics Using Multiple Acoustic Models Under Noisy Environment [J] . Randy GOMEZ, Akinobu LEE, Hiroshi SARUWATARI, 電子情報通信学会技術研究報告. 音声. Speech . 2004,第542期

机译：噪声环境下基于HMM充分统计的多种声学模型的无监督说话人自适应
4. Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions [C] . T K Ansari, Rajath Kumar, Sonali Singh, 2017 IEEE Automatic Speech Recognition and Understanding Workshop . 2017

机译：在零资源条件下用于语言独立声学建模的无监督HMM后验图
5. Saudi college students' independent language learning strategies through multimedia resources: Perceptions of benefits and implications for language learning. [D] . AlMaghrabi, Budoor K. 2012

机译：沙特阿拉伯大学生通过多媒体资源进行的独立语言学习策略：对语言学习的好处和影响的感知。
6. Enhancing African low-resource languages: Swahili data for language modelling [O] . Casper S. Shikali, Refuoe Mokhosi 2020

机译：增强非洲低资源语言：语言建模的斯瓦希里语数据
7. Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model [O] . Ramya Rasipuram, Mathew Magimai-Doss 2015

机译：声学和词汇资源限制ASR使用语言无关的声学模型和语言相关的概率词汇模型

Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

摘要

著录项

相似文献

相关主题

期刊订阅