首页> 外文学位 >Acoustic-feature-based frequency warping for speaker normalization.
【24h】

Acoustic-feature-based frequency warping for speaker normalization.

机译:基于声音特征的频率扭曲,用于扬声器归一化。

获取原文
获取原文并翻译 | 示例

摘要

Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough data are available for training to overcome the variability of acoustical properties among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms, in an attempt to reduce variability between speakers.;In this work we study the possible benefits of the use of acoustic features that are believed to be key to speech perception in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the first three formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization compared to conventional techniques. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare to their performance relative to that of existing algorithms.;We have found that the specific shape of the warping function appears to be irrelevant in terms of improvement in recognition accuracy. The use of a linear function, the simplest choice, allowed us to employ linear regression to define which features to use and how to weigh them. We present a method that finds the optimal set of weights for a set of speakers given the slope of the best warping function. Selection of a limited subset of features for use is a special case of this method where the weights are restricted to one or zero.;The application of our speaker normalization algorithm on the ARPA Resource Management task resulted in sizable improvements compared to previous techniques. Speaker normalization applied to the ARPA Wall Street Journal (WSJ) and Broadcast News (Hub 4) tasks resulted in more modest improvements. We have investigated the possible causes of this. Our experiments indicate that normalization is less effective with a larger number of speakers presumably because in this case the output probability densities of HMMs tend to be broader and hence representative of a large class of speakers. In addition to this, increasing the vocabulary size tends to increase the search space, causing correct hypotheses to be replaced by errorful ones. The benefit brought about by normalization is thus diluted.;While a number of recent successful speaker normalization algorithms have incorporated speaker-specific frequency warping to the initial signal processing, these algorithms do not make extensive use of acoustic features contained in the incoming speech.;The amount of improvement provided by normalization also increases with increasing sentence duration in Hub 4. Since the actual Hub 4 contains a large number of short segments, the normalization provides a more limited improvement in performance.
机译:当有足够的数据可用于训练以克服说话者之间声学特性的可变性时,取决于说话者的自动语音识别系统将胜过与说话者无关的系统。说话人归一化技术修改了传入语音波形的频谱表示,以尝试减少说话者之间的差异。在这项工作中,我们研究了使用声学特征的可能益处,这些声学特征被认为是使用扬声器进行归一化算法的语音感知的关键频率扭曲。我们研究了与传统技术相比,使用此类功能(特别是前三个共振峰频率)可在多大程度上提高识别准确性并降低说话人归一化的计算复杂性。我们将几种类型的特征集和变形函数的特征和局限性与相对于现有算法的性能进行了比较;我们发现变形函数的特定形状似乎与识别能力的提高无关准确性。线性函数(最简单的选择)的使用使我们能够使用线性回归来定义要使用的特征以及如何权衡它们。我们提出了一种方法,该方法在给定最佳扭曲函数的斜率的情况下,为一组扬声器找到最佳权重集。选择权重有限的子集是该方法的一种特殊情况,该方法将权重限制为一或零。;与以前的技术相比,在ARPA资源管理任务中应用我们的说话人归一化算法产生了可观的改进。对ARPA《华尔街日报》(WSJ)和《广播新闻》(Hub 4)任务应用的说话人规范化导致更适度的改进。我们已经调查了可能的原因。我们的实验表明,使用大量说话者进行归一化效果较差,大概是因为在这种情况下,HMM的输出概率密度趋于更宽,因此代表了一大类说话者。除此之外,增加词汇量往往会增加搜索空间,导致正确的假设被错误的假设所代替。归一化带来的好处因此被淡化。虽然许多最近成功的扬声器归一化算法已将特定于扬声器的频率扭曲结合到初始信号处理中,但是这些算法并未充分利用传入语音中包含的声学特征。规范化提供的改进量也随着集线器4中句子持续时间的增加而增加。由于实际的集线器4包含大量的短段,因此规范化在性能上提供了更为有限的改进。

著录项

  • 作者

    Gouvea, Evandro Bacci.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 1999
  • 页码 118 p.
  • 总页数 118
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号