Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

Madhavi Maulik C.; Patil Hemant A.

首页> 外文期刊>Computer speech and language >Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

【24h】

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

机译：使用高斯混合模型框架进行查询逐期检测的高斯混合模型框架的声带长度标准化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A speech spectrum is known to be changed by the variations in the length of the vocal tract of a speaker. This is because of the fact that speech formants are inversely related to the vocal tract length (VTL). The process of compensating spectral variation due to the length of the vocal tract is known as Vocal Tract Length Normalization (VTLN). VTLN is a very important speaker normalization technique for speech recognition and related tasks. In this paper, we used Gaussian Posteriorgram (GP) of VTL-warped spectral features for a Query-by-Example Spoken Term Detection (QbE-STD) task. This paper presents the use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, the presented GMM framework does not require phoneme-level transcription. We observed the correlation between the VTLN warping factor estimates obtained via a supervised HMM-based approach and an unsupervised GMM-based approach. In addition, a phoneme recognition and speaker de-identification tasks were conducted using GMM-based VTLN warping factor estimates. For QbE-STD, we considered three spectral features, namely, Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), and MFCC-TMP (which uses Teager Energy Operator (TEO) to exploit implicitly magnitude and phase information in the MFCC framework). Linear frequency scaling variations for VTLN warping factor are incorporated into these three cepstral representations for the QbE-STD task. Similarly, VTL-warped Gaussian posteriorgram improved the Maximum Term Weighted Value by 0.021 (i.e., 2.1%), and 0.015 (i.e., 1.5%), for MFCC and PLP feature sets, respectively, on the evaluation set of the MediaEval SWS 2013 corpus. The better performance is primarily due to VTLN warping factor estimation using unsupervised GMM framework. Finally, the effectiveness of the proposed VTL-warped GP is presented to rescore using various detection sources, such as depth of detection valley, Self-Similarity Matrix, Pseudo Relevance Feedback and weighted mean features. (C) 2019 Elsevier Ltd. All rights reserved.

机译：已知语音频谱通过扬声器的声道长度的变化来改变。这是因为语音格式与声带长度（VTL）反向相关的事实。由于声道的长度而补偿光谱变化的过程称为声带长度归一化（VTLN）。 VTLN是一种非常重要的语音识别和相关任务的扬声器标准化技术。在本文中，我们使用了VTL扭曲光谱特征的高斯后验仪（GP），以便逐个语言检测（QBE-STD）任务。本文介绍了用于VTLN翘曲因子估计的高斯混合模型（GMM）框架。特别是，所呈现的GMM框架不需要音素级转录。我们观察到通过受监督的基于HMM的方法获得的VTLN翘曲因子估计与无监督基于GMM的方法之间的相关性。此外，使用基于GMM的VTLN翘曲因子估计来进行音素识别和扬声器去识别任务。对于QBE-STD，我们考虑了三个光谱特征，即麦倍频谱系数（MFCC），感知线性预测（PLP）和MFCC-TMP（使用TEXGEN能量运算符（TEO）利用隐式幅度和相位信息MFCC框架）。 VTLN翘曲因子的线性频率缩放变化被纳入了QBE-STD任务的这三个谱表示。类似地，VTL扭曲高斯后验速度分别将最大术语加权值提高0.021（即，2.1％）和0.015（即，1.5％），分别用于MYFCC和PLP特征集，在MediaEval SWS 2013语料库的评估集上。更好的性能主要是由于VTLN翘曲因子估计使用无监督的GMM框架。最后，提出了所提出的VTL扭曲GP的有效性以使用各种检测源来重新核，例如检测谷的深度，自相似矩阵，伪相关反馈和加权平均特征。（c）2019 Elsevier Ltd.保留所有权利。

著录项

来源
《Computer speech and language》 |2019年第11期|175-202|共28页
作者
Madhavi Maulik C.; Patil Hemant A.;
展开▼
作者单位

Natl Univ Singapore Dept Elect & Comp Engn Singapore Singapore;

DA IICT Gandhinagar India;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Vocal Tract Length Normalization; Query-by-example spoken term detection; Spoken web search task; Gaussian posteriorgrams; Dynamic time warping;

机译：声乐长度归一化;逐个语言术语检测;口头网上搜索任务;高斯后验纪录;动态时间翘曲;

相似文献

外文文献
中文文献
专利

1. Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection [J] . Madhavi Maulik C., Patil Hemant A. Computer speech and language . 2019,第NOVa期

机译：使用高斯混合模型框架进行语音片段长度归一化，以示例查询口语术语
2. Design of mixture of GMMs for Query-by-Example Spoken Term Detection [J] . Maulik C. Madhavi, Hemant A. Patil Computer speech and language . 2018,第NOVa期

机译：用于示例查询口语检测的GMM混合设计
3. Multilingual query-by-example spoken term detection in Indian languages [J] . Abhimanyu Popli, Arun Kumar International journal of speech technology . 2019,第1期

机译：多语言示例查询印度语言中的口语术语检测
4. Query-by-example spoken term detection based on phonetic posteriorgram Query-by-example spoken term detection based on phonetic posteriorgram [C] . Beili Song, Wei-Qiang Zhang, Meng Cai, International Conference on Education, Management and Computing Technology . 2015

机译：基于拼音后图的语音后图逐个语言检测查询逐期口语检测
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. Flying Small Target Detection for Anti-UAV Based on a Gaussian Mixture Model in a Compressive Sensing Domain [O] . Chuanyun Wang, Tian Wang, Ershen Wang, 2019

机译：基于高斯混合模型的压缩感知域反无人机飞行小目标检测
7. Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation [O] . Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, 2019

机译：从口语查询中搜索：多域国际Albayzin 2018逐个语言检测评估
8. Signal Detection and Normalization in Underwater Noises Modeled as a Gaussian-Gaussian Mixture [R] . Bouvet, M., Schwartz, S. C. 1986

机译：水下噪声信号检测与归一化模型的高斯 - 高斯混合模型

Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅