...
首页> 外文期刊>Circuits, systems, and signal processing >Role of Linear, Mel and Inverse-Mel Filterbanks in Automatic Recognition of Speech from High-Pitched Speakers
【24h】

Role of Linear, Mel and Inverse-Mel Filterbanks in Automatic Recognition of Speech from High-Pitched Speakers

机译:线性,梅尔和逆梅尔滤波器组在自动识别高音扬声器语音中的作用

获取原文
获取原文并翻译 | 示例
           

摘要

In the context of automatic speech recognition (ASR), the power spectrum is generally warped to the Mel-scale during front-end speech parameterization. This is motivated by the fact that human perception of sound is nonlinear. The Mel-filterbank provides better resolution for low-frequency contents, while a greater degree of averaging happens in the high-frequency range. The work presented in this paper aims at studying the role of linear, Mel and inverse-Mel-filterbanks in the context of ASR. When speech data are from high-pitched speakers like children, there is a significant amount of relevant information in the high-frequency region. Hence, down-sampling the information in that range through Mel-filterbank reduces the recognition performance. On the other hand, employing inverse-Mel or linear-filterbanks is expected to be more effective in such cases. The same has been experimentally validated in this work. For that purpose, an ASR system is developed on adults' speech and tested using data from adult as well as child speakers. Significantly improved recognition rates are noted for children's as well adult females' speech when linear or inverse-Mel-filterbank is used. The use of linear filters results in a relative improvement of 21% over the baseline. To further boost the performance, vocal-tract length normalization, explicit pitch scaling and pitch-adaptive spectral estimation are also explored on top of linear filterbank.
机译:在自动语音识别(ASR)的情况下,通常在前端语音参数化过程中将功率谱扭曲为梅尔标度。这是因为人类对声音的感知是非线性的。梅尔滤波器组为低频内容提供了更好的分辨率,而高频范围内的平均程度更高。本文提出的工作旨在研究线性,梅尔和逆梅尔滤波器组在ASR中的作用。当语音数据来自像孩子这样的高音调说话者时,在高频区域中会有大量相关信息。因此,通过梅尔滤波器组对该范围内的信息进行下采样会降低识别性能。另一方面,在这种情况下,采用逆梅尔或线性滤波器组有望更加有效。这项工作在实验中也得到了验证。为此,在成年人的语音上开发了一个ASR系统,并使用成年人和儿童说话者的数据进行了测试。使用线性或逆梅尔滤波器组时,儿童和成年女性的语音识别率均得到显着提高。线性滤波器的使用相对于基线而言相对提高了21%。为了进一步提高性能,在线性滤波器组的顶部还研究了声道长度归一化,显式音调缩放和音调自适应频谱估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号