首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise
【24h】

A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise

机译:基于语料库的非平稳噪声语音增强方法

获取原文
获取原文并翻译 | 示例

摘要

Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, we propose a method to maximally extract these two features of speech for speech enhancement. We demonstrate that this can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The estimate is formed by using the longest matching segments found in the corpus sentences. Longer speech segments as whole units contain more distinct dynamics and richer speaker characteristics, and can be identified more accurately from noise than shorter speech segments. Therefore, estimation based on the longest recognized segments increases the noise immunity and hence the estimation accuracy. The new approach consists of a statistical model to represent up to sentence-long temporal dynamics in the corpus speech, and an algorithm to identify the longest matching segments between the noisy sentence and the corpus sentences. The algorithm is made more robust to noise uncertainty by introducing missing-feature based noise compensation into the corpus sentences. Experiments have been conducted on the TIMIT database for speech enhancement from various types of nonstationary noise including song, music, and crosstalk speech. The new approach has shown improved performance over conventional enhancement algorithms in both objective and subjective evaluations.
机译:时间动态和说话者特征是语音的两个重要特征,它们将语音与噪声区分开。在本文中,我们提出了一种最大程度地提取语音的这两个特征以进行语音增强的方法。我们证明,这可以减少对有关噪声的先验信息的需求,而有关噪声的快速变化可能难以估计。给定嘈杂的语音,新方法通过将干净语音的较长部分识别为整体来估计干净语音。在识别中,以来自语音语料库的干净的语音句子为例。在嘈杂的句子和语料库句子之间识别匹配的句段。通过使用在语料库句子中找到的最长匹配段来形成估计。较长的语音片段(作为整个单元)包含更多不同的动态特性和更丰富的说话者特征,并且与较短的语音片段相比,可以更准确地从噪声中识别出来。因此,基于最长识别段的估计提高了抗扰度,从而提高了估计精度。新方法包括一个统计模型,该模型可以代表语料库语音中的句子长时态动态,以及一种算法,可以识别嘈杂的句子和语料库句子之间的最长匹配段。通过将基于缺失特征的噪声补偿引入语料库句子,该算法对噪声不确定性更加鲁棒。已经在TIMIT数据库上进行了实验,以从各种类型的非平稳噪声(包括歌曲,音乐和串扰语音)中增强语音。与传统的增强算法相比,新方法在客观和主观评估方面均表现出更高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号