首页> 外文OA文献 >ENHANCEMENT OF SPEECH INTELLIGIBILITY USING SPEECH TRANSIENTS EXTRACTED BY A WAVELET PACKET-BASED REAL-TIME ALGORITHM
【2h】

ENHANCEMENT OF SPEECH INTELLIGIBILITY USING SPEECH TRANSIENTS EXTRACTED BY A WAVELET PACKET-BASED REAL-TIME ALGORITHM

机译:利用基于小波包的实时算法提取的语音瞬态增强语音的智能性

摘要

Studies have shown that transient speech, which is associated with consonants, transitions between consonants and vowels, and transitions within some vowels, is an important cue for identifying and discriminating speech sounds. However, compared to the relatively steady-state vowel segments of speech, transient speech has much lower energy and thus is easily masked by background noise. Emphasis of transient speech can improve the intelligibility of speech in background noise, but methods to demonstrate this improvement have either identified transient speech manually or proposed algorithms that cannot be implemented to run in real-time.We have developed an algorithm to automatically extract transient speech in real-time. The algorithm involves the use of a function, which we term the transitivity function, to characterize the rate of change of wavelet coefficients of a wavelet packet transform representation of a speech signal. The transitivity function is large and positive when a signal is changing rapidly and small when a signal is in steady state. Two different definitions of the transitivity function, one based on the short-time energy and the other on Mel-frequency cepstral coefficients, were evaluated experimentally, and the MFCC-based transitivity function produced better results. The extracted transient speech signal is used to create modified speech by combining it with original speech.To facilitate comparison of our transient and modified speech to speech processed using methods proposed by other researcher to emphasize transients, we developed three indices. The indices are used to characterize the extent to which a speech modification/processing method emphasizes (1) a particular region of speech, (2) consonants relative to, and (3) onsets and offsets of formants compared to steady formant. These indices are very useful because they quantify differences in speech signals that are difficult to show using spectrograms, spectra and time-domain waveforms.The transient extraction algorithm includes parameters which when varied influence the intelligibility of the extracted transient speech. The best values for these parameters were selected using psycho-acoustic testing. Measurements of speech intelligibility in background noise using psycho-acoustic testing showed that modified speech was more intelligible than original speech, especially at high noise levels (-20 and -15 dB). The incorporation of a method that automatically identifies and boosts unvoiced speech into the algorithm was evaluated and showed that this method does not result in additional speech intelligibility improvements.
机译:研究表明,与辅音,辅音和元音之间的过渡以及某些元音内的过渡有关的瞬时语音是识别和区分语音的重要提示。但是,与语音的相对稳态元音段相比,瞬态语音的能量要低得多,因此很容易被背景噪声掩盖。强调瞬态语音可以提高语音在背景噪声中的清晰度,但是证明这种改进的方法要么手动识别了瞬态语音,要么提出了无法实时运行的算法。我们开发了一种自动提取瞬态语音的算法实时。该算法涉及使用函数(我们称为传递函数)来表征语音信号的小波包变换表示的小波系数的变化率。当信号快速变化时,传递函数既大又正,而当信号处于稳态时传递函数则小。实验评估了传递函数的两种不同定义,一种基于短时能量,另一种基于梅尔频率倒谱系数,基于MFCC的传递函数产生了更好的结果。提取的瞬态语音信号通过与原始语音相结合而用于创建修改后的语音。为了便于将我们的瞬态和修改后的语音与使用其他研究人员提出的强调瞬态的方法处理的语音进行比较,我们开发了三个指标。索引用于表征语音修改/处理方法强调(1)语音的特定区域,(2)相对于辅音的音素,以及(3)与稳定共振峰相比共振峰的开始和偏移的程度。这些索引非常有用,因为它们可以量化语音信号中的差异,而语音信号的差异很难通过频谱图,频谱和时域波形来显示。瞬态提取算法包含的参数在变化时会影响所提取的瞬态语音的清晰度。这些参数的最佳值是使用心理声学测试选择的。使用心理声学测试对背景噪声中的语音清晰度进行测量,结果表明,修改后的语音比原始语音更容易理解,尤其是在高噪声水平(-20和-15 dB)下。评估了将自动识别和增强清语音的方法合并到算法中的结果,结果表明该方法不会带来其他语音清晰度方面的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号