Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

Madhavi Maulik C.; Patil Hemant A.

首页> 外文期刊>Computer speech and language >Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

【24h】

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

机译：使用高斯混合模型框架进行语音片段长度归一化，以示例查询口语术语

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A speech spectrum is known to be changed by the variations in the length of the vocal tract of a speaker. This is because of the fact that speech formants are inversely related to the vocal tract length (VTL). The process of compensating spectral variation due to the length of the vocal tract is known as Vocal Tract Length Normalization (VTLN). VTLN is a very important speaker normalization technique for speech recognition and related tasks. In this paper, we used Gaussian Posteriorgram (GP) of VTL-warped spectral features for a Query-by-Example Spoken Term Detection (QbE-STD) task. This paper presents the use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, the presented GMM framework does not require phoneme-level transcription. We observed the correlation between the VTLN warping factor estimates obtained via a supervised HMM-based approach and an unsupervised GMM-based approach. In addition, a phoneme recognition and speaker de-identification tasks were conducted using GMM-based VTLN warping factor estimates. For QbE-STD, we considered three spectral features, namely, Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), and MFCC-TMP (which uses Teager Energy Operator (TEO) to exploit implicitly magnitude and phase information in the MFCC framework). Linear frequency scaling variations for VTLN warping factor are incorporated into these three cepstral representations for the QbE-STD task. Similarly, VTL-warped Gaussian posteriorgram improved the Maximum Term Weighted Value by 0.021 (i.e., 2.1%), and 0.015 (i.e., 1.5%), for MFCC and PLP feature sets, respectively, on the evaluation set of the MediaEval SWS 2013 corpus. The better performance is primarily due to VTLN warping factor estimation using unsupervised GMM framework. Finally, the effectiveness of the proposed VTL-warped GP is presented to rescore using various detection sources, such as depth of detection valley, Self-Similarity Matrix, Pseudo Relevance Feedback and weighted mean features. (C) 2019 Elsevier Ltd. All rights reserved.

机译：已知语音频谱通过说话者的声道长度的变化而改变。这是因为语音共振峰与声道长度（VTL）成反比。补偿由于声道长度而引起的频谱变化的过程称为声带长度归一化（VTLN）。 VTLN是用于语音识别和相关任务的非常重要的说话人归一化技术。在本文中，我们将VTL扭曲的频谱特征的高斯后验图（GP）用于按示例查询口语词检测（QbE-STD）任务。本文介绍了使用高斯混合模型（GMM）框架进行VTLN翘曲因子估计的方法。特别是，提出的GMM框架不需要音素级别的转录。我们观察到通过基于HMM的有监督方法和基于GMM的无监督方法获得的VTLN翘曲因子估计之间的相关性。此外，使用基于GMM的VTLN翘曲因子估计来执行音素识别和说话者取消识别任务。对于QbE-STD，我们考虑了三个频谱特征，即梅尔频率倒谱系数（MFCC），感知线性预测（PLP）和MFCC-TMP（使用Teager能量算子（TEO）隐含地利用振幅和相位信息） MFCC框架）。 VTLN翘曲因子的线性频率缩放比例变化已合并到QbE-STD任务的这三个倒谱表示中。同样，在MediaEval SWS 2013语料库的评估集上，对于MFCC和PLP功能集，VTL扭曲的高斯后验图分别将最大项加权值提高了0.021（即2.1％）和0.015（即1.5％）。。更好的性能主要归因于使用无监督GMM框架的VTLN翘曲因子估计。最后，使用各种检测源（如检测谷深度，自相似矩阵，伪相关反馈和加权均值特征）对提出的VTL扭曲GP的有效性进行了重新评估。（C）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Computer speech and language》 |2019年第11期|175-202|共28页
作者
Madhavi Maulik C.; Patil Hemant A.;
展开▼
作者单位

Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore;

DA IICT, Gandhinagar, India;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Vocal Tract Length Normalization; Query-by-example spoken term detection; Spoken web search task; Gaussian posteriorgrams; Dynamic time warping;

机译：声乐长度归一化;逐个语言术语检测;口头网上搜索任务;高斯后验纪录;动态时间翘曲;

相似文献

外文文献
中文文献
专利

1. Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection [J] . Madhavi Maulik C., Patil Hemant A. Computer speech and language . 2019,第Nova期

机译：使用高斯混合模型框架进行查询逐期检测的高斯混合模型框架的声带长度标准化
2. Design of mixture of GMMs for Query-by-Example Spoken Term Detection [J] . Maulik C. Madhavi, Hemant A. Patil Computer speech and language . 2018,第NOVa期

机译：用于示例查询口语检测的GMM混合设计
3. Multilingual query-by-example spoken term detection in Indian languages [J] . Abhimanyu Popli, Arun Kumar International journal of speech technology . 2019,第1期

机译：多语言示例查询印度语言中的口语术语检测
4. Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram [C] . Beili Song, Wei-Qiang Zhang, Meng Cai, International Conference on Education, Management and Computing Technology . 2015

机译：基于拼音后图的语音后图逐个语言检测查询逐期口语检测
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Flying Small Target Detection for Anti-UAV Based on a Gaussian Mixture Model in a Compressive Sensing Domain [O] . Chuanyun Wang, Tian Wang, Ershen Wang, 2019

机译：基于高斯混合模型的压缩感知域反无人机飞行小目标检测
7. Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation [O] . Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, 2019

机译：从口语查询中搜索：多域国际Albayzin 2018逐个语言检测评估
8. Signal Detection and Normalization in Underwater Noises Modeled as a Gaussian-Gaussian Mixture [R] . Bouvet, M., Schwartz, S. C. 1986

机译：水下噪声信号检测与归一化模型的高斯 - 高斯混合模型

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

摘要

著录项

相似文献

相关主题

期刊订阅