首页> 外国专利> Method and system for speech reconstruction from speech recognition features, pitch and voicing with resampled basis functions providing reconstruction of the spectral envelope

Method and system for speech reconstruction from speech recognition features, pitch and voicing with resampled basis functions providing reconstruction of the spectral envelope

机译：根据语音识别特征，音调和发声以及重新采样的基函数进行语音重构的方法和系统，可重构频谱包络

页面导航

摘要
著录项
相似文献

摘要

A speech reconstruction method and system for converting a series of binned spectra or functions thereof such as the Mel Frequency Cepstra Coefficients (MFCC), of an original digitized speech signal, into a reconstructed speech signal, where each binned spectrum has a respective pitch value and voicing decision. The binned spectra are derived from the original digitized speech signal at successive instances by multiplying each estimate of the spectral envelope by a predetermined set of frequency domain window functions and computing the integrals thereof. At each respective time instance, harmonic frequencies and weights are generated according to the respective pitch value and voicing decision. Basis functions having bounded supports on the frequency axis are each sampled at all said harmonic frequencies, which are within its support and multiplied by respective harmonic weights. The sampled basis functions are combined with respective phases, generated according to the pitch value, voicing decision and possibly the binned spectrum, resulting in a complex line spectrum corresponding to each basis function. Coefficients are generated of the basis functions, and each of the points of the respective complex line spectra is multiplied by the respective basis function coefficient. The complex line spectra are summed up to generate for each time instance a single complex line spectrum with values for all harmonic frequencies. A time signal is generated from complex line spectra computed at successive instances of time.

机译：一种语音重建方法和系统，用于将原始数字化语音信号的一系列合并频谱或其功能（例如梅尔频率倒谱系数（MFCC））转换为重建语音信号，其中每个合并频谱具有各自的音高值和做出决定。通过将频谱包络的每个估计值乘以一组预定的频域窗口函数并计算其积分，可以在连续的实例中从原始数字化语音信号得出合并的频谱。在每个相应的时刻，根据各自的音调值和发声决定生成谐波频率和权重。在频率轴上具有有限支持的基础函数均在所有所述谐波频率上采样，这些谐波频率在其支持范围内并乘以相应的谐波权重。采样的基函数与相应的相位组合，根据音高值，发声决定以及可能的合并频谱生成相应的相位，从而产生与每个基函数相对应的复杂线谱。产生基函数的系数，并将各个复线谱的每个点乘以各个基函数系数。将复数线谱相加，以在每种情况下生成具有所有谐波频率值的单个复数线谱。从在连续的时间实例处计算出的复杂线谱中生成时间信号。

著录项

公开/公告号US6725190B1

专利类型
公开/公告日2004-04-20

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US19990432081
发明设计人 DAN CHAZAN;GILAD COHEN;RON HOORY;
展开▼

申请日1999-11-02
分类号G10L190/20;
国家 US
入库时间 2022-08-21 23:13:44

相似文献

专利
外文文献
中文文献