Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

Ozkan E.; Ozbek I.Y.; Demirekler M.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

【24h】

Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

机译：时变Dirichlet过程混合模型的动态语音频谱表示和声道共振频率的可变数目跟踪

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by ldquouncontrollable modesrdquo of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao-Blackwellized particle filter.

机译：在本文中，我们提出了一种用于动态语音频谱表示和跟踪声道共振（VTR）频率的新方法。该方法涉及将语音信号的频谱密度表示为具有未知数量的分量的高斯混合信号，为此使用时变Dirichlet过程混合模型（DPM）。在结果表示中，共振峰的数量允许随时间变化。本文首先对语音发声期间共振峰在频谱中的连续性进行了分析。该分析基于级联管模型的新状态空间表示。我们表明，出现在频谱中的共振峰的数量与声带收缩的位置（即激发的位置）直接相关。此外，共振峰在频谱中的消失是由状态空间模型的“不可控制的模式”来解释的。在频谱中存在数量不等的共振峰的假设下，我们建议使用基于DPM模型的多目标跟踪算法来跟踪未知数目的共振峰。跟踪算法为未知共振峰状态定义了分层贝叶斯模型，并通过Rao-Blackwellized粒子滤波器进行推理。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2009年第8期|p.1518-1532|共15页
作者
Ozkan E.; Ozbek I.Y.; Demirekler M.;
展开▼
作者单位

Dept. of Electr. & Electron. Eng., Middle East Tech. Univ., Ankara, Turkey;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayes methods; Gaussian processes; particle filtering (numerical methods); speech processing; target tracking; Gaussian mixture model; Rao-Blackwellized particle filter; concatenated tube model; dynamic speech spectrum representation; hierarchical Bayesian model; multitarget tracking algorithm; spectral density; speech signals; speech utterance; state space representation; time-varying Dirichlet process mixture model; variable number tracking; vocal tract resonance frequencies; Dirichlet process; formant tracking; particle filter; spectral representation; spectrum estimation; vocal tract resonance (VTR);

机译：贝叶斯方法;高斯过程;粒子滤波（数值方法）;语音处理;目标跟踪;高斯混合模型;Rao-Blackwellized粒子滤波器;级联管模型;动态语音频谱表示;分层贝叶斯模型;多目标跟踪算法;谱密度;语音信号;语音发声;状态空间表示;时变Dirichlet过程混合模型;变量数跟踪;声道共振频率;Dirichlet过程;共振峰跟踪;粒子滤波;频谱表示;频谱估计;声道共振（VTR）;

相似文献

外文文献
中文文献
专利

1. Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics [J] . Li Deng, Jeff Ma The Journal of the Acoustical Society of America . 2000,第6期

机译：使用统计共发音模型的自发语音识别，用于声道共振动态
2. Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model [J] . Li Deng, Leo J. Lee, Hagai Attias, IEEE transactions on audio, speech and language processing . 2007,第1期

机译：自适应卡尔曼滤波和平滑法，用于跟踪人声共振，采用连续值隐藏动态模型
3. ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in real-time magnetic resonance imaging speech production data [J] . Miran Oh, Yoonjeong Lee The Journal of the Acoustical Society of America . 2018,第4aPta1期

机译：act：一种自动质心跟踪工具，用于分析实时磁共振成像语音生产数据中的声带动作
4. A STRUCTURED SPEECH MODEL WITH CONTINUOUS HIDDEN DYNAMICS AND PREDICTION-RESIDUAL TRAINING FOR TRACKING VOCAL TRACT RESONANCES [C] . Li Deng, Leo J. Lee, Hagai Attias, IEEE International Conference on Acoustics, Speech, and Signal Processing . 2004

机译：具有连续隐藏动力学和预测残余训练的结构化语音模型，用于跟踪声道共振
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. ACT: An Automatic Centroid Tracking tool for analyzing vocal tract actions in real-time magnetic resonance imaging speech production data [O] . Miran Oh, Yoonjeong Lee -1

机译：ACT：一种自动质心跟踪工具用于分析实时磁共振成像语音产生数据中的声道动作
7. A Structured Speech Model with Continuous Hidden Dynamics and Prediction-Residual Training for Tracking Vocal Tract Resonances [O] . Li Deng, Leo J. Lee, Hagai Attias, 2004

机译：具有连续隐藏动力学和预测残差训练的结构化语音模型用于跟踪声带共振

Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

摘要

著录项

相似文献

相关主题

期刊订阅