Acoustic-feature-based frequency warping for speaker normalization.

机译：基于声音特征的频率扭曲，用于扬声器归一化。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker-dependent automatic speech recognition systems are known to outperform speaker-independent systems when enough data are available for training to overcome the variability of acoustical properties among speakers. Speaker normalization techniques modify the spectral representation of incoming speech waveforms, in an attempt to reduce variability between speakers.;In this work we study the possible benefits of the use of acoustic features that are believed to be key to speech perception in speaker normalization algorithms using frequency warping. We study the extent to which the use of such features, including specifically the first three formant frequencies, can improve recognition accuracy and reduce computational complexity for speaker normalization compared to conventional techniques. We examine the characteristics and limitations of several types of feature sets and warping functions as we compare to their performance relative to that of existing algorithms.;We have found that the specific shape of the warping function appears to be irrelevant in terms of improvement in recognition accuracy. The use of a linear function, the simplest choice, allowed us to employ linear regression to define which features to use and how to weigh them. We present a method that finds the optimal set of weights for a set of speakers given the slope of the best warping function. Selection of a limited subset of features for use is a special case of this method where the weights are restricted to one or zero.;The application of our speaker normalization algorithm on the ARPA Resource Management task resulted in sizable improvements compared to previous techniques. Speaker normalization applied to the ARPA Wall Street Journal (WSJ) and Broadcast News (Hub 4) tasks resulted in more modest improvements. We have investigated the possible causes of this. Our experiments indicate that normalization is less effective with a larger number of speakers presumably because in this case the output probability densities of HMMs tend to be broader and hence representative of a large class of speakers. In addition to this, increasing the vocabulary size tends to increase the search space, causing correct hypotheses to be replaced by errorful ones. The benefit brought about by normalization is thus diluted.;While a number of recent successful speaker normalization algorithms have incorporated speaker-specific frequency warping to the initial signal processing, these algorithms do not make extensive use of acoustic features contained in the incoming speech.;The amount of improvement provided by normalization also increases with increasing sentence duration in Hub 4. Since the actual Hub 4 contains a large number of short segments, the normalization provides a more limited improvement in performance.

机译：当有足够的数据可用于训练以克服说话者之间声学特性的可变性时，取决于说话者的自动语音识别系统将胜过与说话者无关的系统。说话人归一化技术修改了传入语音波形的频谱表示，以尝试减少说话者之间的差异。在这项工作中，我们研究了使用声学特征的可能益处，这些声学特征被认为是使用扬声器进行归一化算法的语音感知的关键频率扭曲。我们研究了与传统技术相比，使用此类功能（特别是前三个共振峰频率）可在多大程度上提高识别准确性并降低说话人归一化的计算复杂性。我们将几种类型的特征集和变形函数的特征和局限性与相对于现有算法的性能进行了比较;我们发现变形函数的特定形状似乎与识别能力的提高无关准确性。线性函数（最简单的选择）的使用使我们能够使用线性回归来定义要使用的特征以及如何权衡它们。我们提出了一种方法，该方法在给定最佳扭曲函数的斜率的情况下，为一组扬声器找到最佳权重集。选择权重有限的子集是该方法的一种特殊情况，该方法将权重限制为一或零。;与以前的技术相比，在ARPA资源管理任务中应用我们的说话人归一化算法产生了可观的改进。对ARPA《华尔街日报》（WSJ）和《广播新闻》（Hub 4）任务应用的说话人规范化导致更适度的改进。我们已经调查了可能的原因。我们的实验表明，使用大量说话者进行归一化效果较差，大概是因为在这种情况下，HMM的输出概率密度趋于更宽，因此代表了一大类说话者。除此之外，增加词汇量往往会增加搜索空间，导致正确的假设被错误的假设所代替。归一化带来的好处因此被淡化。虽然许多最近成功的扬声器归一化算法已将特定于扬声器的频率扭曲结合到初始信号处理中，但是这些算法并未充分利用传入语音中包含的声学特征。规范化提供的改进量也随着集线器4中句子持续时间的增加而增加。由于实际的集线器4包含大量的短段，因此规范化在性能上提供了更为有限的改进。

著录项

作者
Gouvea, Evandro Bacci.;
展开▼
作者单位

Carnegie Mellon University.;

展开▼
授予单位 Carnegie Mellon University.;
学科 Engineering Electronics and Electrical.;Computer Science.
学位 Ph.D.
年度 1999
页码 118 p.
总页数 118
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System [J] . Sarangi Susanta Kumar, Saha Goutam Journal of VLSI signal processing systems for signal, image, and video technology . 2020,第7期

机译：强大的扬声器验证系统中的临时谱特征的改进语音信号的频率翘曲比例
2. Arabic Audio News Retrieval System Using Dependent Speaker Mode, Mel Frequency Cepstral Coefficient and Dynamic Time Warping Techniques [J] . Hasan Muaidi, Ayat Al-Ahmad, Thaer Khdoor, Research journal of applied science, engineering and technology . 2014,第24期

机译：阿拉伯音频新闻检索系统，使用相关的扬声器模式，梅尔频率倒谱系数和动态时间扭曲技术
3. Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis [J] . Weixun Gao, Qiying Cao Journal of information science and engineering . 2014,第4期

机译：基于HMM的语音合成中的说话人自适应频率弯曲
4. Comparison of Frequency-Warped Filter Banks in relation to Robust Features for Speaker Identification [C] . SHARADA V. CHOUGULE, MAHESH S. CHAVAN International Conference on Instrumentation, Measurement, Circuits and Systems . 2014

机译：频率扭曲滤波器银行与扬声器识别的鲁棒功能的比较
5. Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition. [D] . Panchapagesan, Sankaran. 2008

机译：通过线性变换实现的频率扭曲和声道反转，可在自动语音识别中实现说话人归一化。
6. One-against-All Weighted Dynamic Time Warping for Language-Independent and Speaker-Dependent Speech Recognition in Adverse Conditions [O] . Xianglilan Zhang, Jiping Sun, Zhigang Luo 2010

机译：不利条件下与语言无关和与说话者相关的语音识别的一对多加权动态时间规整
7. Arabic Audio News Retrieval System Using Dependent Speaker Mode, Mel Frequency Cepstral Coefficient and Dynamic Time Warping Techniques [O] . Hasan Muaidi, Ayat Al-Ahmad, Thaer Khdoor, 2014

机译：阿拉伯音频新闻检索系统采用依赖式扬声器模式，MEL频率跳跃系数和动态时间翘曲技术

Acoustic-feature-based frequency warping for speaker normalization.

摘要

著录项

相似文献

相关主题

期刊订阅