首页> 外文学位 >Temporal patterns of frequency-localized features in ASR.
【24h】

Temporal patterns of frequency-localized features in ASR.

机译:ASR中频率局部化特征的时间模式。

获取原文
获取原文并翻译 | 示例

摘要

This work investigates the use of frequency-localized temporal patterns of the speech signal for developing robust front-end for Automatic Speech Recognition (ASR). Various linear transforms are investigated for parameterization of the frequency-localized temporal patterns. We show that temporal patterns closely follow the properties of a first-order Markov process, which results in the PCA transforms being very close to the DCT transform. Better recognition performance is achieved on using the DCT components of temporal patterns as opposed to directly using temporal patterns for feature estimation. Other linear transforms such as Linear Discriminant Analysis (LDA) are also studied for the parameterization. The parameterized TempoRA1 Patterns (TRAPS) are used to estimate broad-phonetic clans-posteriors independently in each critical-band. These class-posteriors are combined and used as the features for word recognition. Our work shows that broad-phonetic features generalize better than other conventional features and yield considerable complementary information with respect to short-term cepstral features in ASR. Two practical applications are proposed for the broad-phonetic TRAPS features: (1) Distributed Speech Recognition (DSR) in cellular telephony, (2) Voice Activity Detection (VAD) tanks. These features yield a significant improvement in the performance for these applications. New band-independent categories are proposed which represent distinct speech-events in the frequency-localized temporal patterns of the speech signal. These categories are obtained by clustering the mean temporal patterns of context-independent phones using an agglomerative hierarchical clustering technique. A Universal TempoRAl PatternS (UTRAPS) system is proposed for the speech-event class-posteriors estimation. Combining UTRAPS features with cepstral features achieves a significant improvement in the recognition performance under noisy conditions. Finally, this work studies the effect of broadening the frequency-context on TRAPS features and ASR. This study shows that combining temporal patterns from more than one critical-band is important to achieve higher recognition rates.
机译:这项工作研究了使用语音信号的频率局部时间模式来开发用于自动语音识别(ASR)的强大前端。研究了各种线性变换,以对频率局部的时间模式进行参数化。我们显示时间模式紧密遵循一阶马尔可夫过程的属性,这导致PCA变换非常接近DCT变换。与使用时间模式进行特征估计相比,使用时间模式的DCT分量可获得更好的识别性能。还对其他线性变换(例如线性判别分析(LDA))进行了参数化研究。参数化的TempoRA1模式(TRAPS)用于在每个关键频带中独立估计宽语音氏族-后验。这些后验组合在一起并用作单词识别的功能。我们的工作表明,广泛的语音特征比其他常规特征具有更好的泛化能力,并且就ASR的短期倒谱特征产生了可观的补充信息。针对广泛的TRAPS功能,提出了两个实际应用:(1)蜂窝电话中的分布式语音识别(DSR),(2)语音活动检测(VAD)槽。这些功能极大地提高了这些应用程序的性能。提出了新的与频带无关的类别,其表示语音信号的频率局部时间模式中的不同语音事件。这些类别是通过使用聚集层次聚类技术对上下文独立电话的平均时间模式进行聚类而获得的。提出了一种通用的临时模式(UTRAPS)系统,用于语音事件类后验估计。将UTRAPS特征与倒谱特征相结合,可在嘈杂条件下显着提高识别性能。最后,这项工作研究了拓宽频率背景对TRAPS功能和ASR的影响。这项研究表明,组合来自多个关键频带的时间模式对于获得更高的识别率很重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号