首页> 外文学位 >Nonstationary time series modeling with applications to speech signal processing.
【24h】

Nonstationary time series modeling with applications to speech signal processing.

机译:非平稳时间序列建模及其在语音信号处理中的应用。

获取原文
获取原文并翻译 | 示例

摘要

We develop statistical methods for the analysis of nonstationary time series and apply them to a variety of problems arising in speech signal processing. Information-carrying natural sound signals such as speech exhibit a degree of controlled nonstationarity in that their statistical properties vary slowly over time. Faithfully modeling these temporal variations is extremely valuable for a wide range of applications and can be accomplished by relying on well-understood acoustic models of speech production, which motivate many of the methods developed in this thesis.;First, we make a number of contributions to the classical problem of formant tracking, in which vocal tract resonances are estimated under the assumption of their invariance on the 15-30 ms scale. Next, we relax this piecewise-stationarity constraint and model the temporal dynamics of the vocal tract using time-varying autoregressive (TVAR) models. We develop their algebraic and geometric properties, introduce several new estimators, and use TVAR models to develop a hypothesis test to detect the presence of vocal tract variation in speech waveform data. We study its asymptotic properties, and illustrate its practical efficacy by detecting vocal tract changes across different timescales of speech dynamics.;Next, we explore how standard fixed-resolution short-time Fourier representations may be generalized in order to adapt to the time-frequency structure of a speech signal. To this end, we introduce a family of adaptive, linear time-frequency representations termed superposition frames and show that they are invertible, numerically-stable, and admit fast overlap-add reconstruction akin to standard short-time Fourier techniques. The general construction proceeds via a local signal-adaptive modification of a Gabor frame. Two signal-dependent schemes for selecting an appropriate superposition frame for signal analysis are given, and the framework is illustrated in the context of speech enhancement.;Finally, we introduce a joint model of the vocal tract and the source waveform in order to take into account its quasi-periodic temporal variations during voicing. We incorporate an estimate of the source waveform into the traditional linear prediction framework via nonparametric wavelet regression; the resultant semi-parametric model is applied to various speech analysis problems including formant and source-harmonics-to-noise ratio estimation, inverse filtering, and voicing detection.
机译:我们开发了统计方法来分析非平稳时间序列,并将其应用于语音信号处理中出现的各种问题。诸如语音之类的承载信息的自然声音信号表现出一定程度的非平稳性,因为它们的统计特性会随时间缓慢变化。忠实地建模这些时间变化对于广泛的应用非常有价值,并且可以通过依靠众所周知的语音生成声学模型来完成,这激发了本文中开发的许多方法。首先,我们做出了许多贡献共振峰跟踪的经典问题,即在15-30 ms尺度不变的情况下估计声道共振。接下来,我们放松此分段平稳性约束,并使用时变自回归(TVAR)模型对声道的时间动态进行建模。我们开发了它们的代数和几何特性,引入了几种新的估计量,并使用TVAR模型开发了假设检验来检测语音波形数据中声道变化的存在。我们研究其渐近性质,并通过检测语音动力学在不同时间尺度上的声道变化来说明其实际功效。;接下来,我们探索如何将标准的固定分辨率短时傅立叶表示推广以适应时频语音信号的结构。为此,我们介绍了一系列称为重叠帧的自适应线性时频表示,并证明它们是可逆的,数值稳定的,并且允许类似于标准短时傅立叶技术的快速重叠添加重建。总体构造是通过对Gabor帧进行局部信号自适应修改来进行的。给出了两种用于选择合适的叠加帧进行信号分析的信号相关方案,并在语音增强的背景下对该框架进行了说明。最后,我们引入了声道与源波形的联合模型,以考虑在发声时说明其准周期的时间变化。我们通过非参数小波回归将源波形的估计值合并到传统的线性预测框架中。所得的半参数模型可用于各种语音分析问题,包括共振峰和信源-谐波-噪声比估计,逆滤波和发声检测。

著录项

  • 作者

    Rudoy, Daniel.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Applied Mathematics.;Statistics.;Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 349 p.
  • 总页数 349
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号