首页> 外文学位 >Model-based speech separation and enhancement with single -microphone input.
【24h】

Model-based speech separation and enhancement with single -microphone input.

机译:具有单麦克风输入的基于模型的语音分离和增强。

获取原文
获取原文并翻译 | 示例

摘要

This thesis focuses on speech source separation problem in a single-microphone scenario. Possible applications of speech separation include recognition, auditory prostheses and surveillance systems. Sound signals typically reach our ears as a mixture of desired signals, other competing sounds and background noise. Example scenarios are talking with someone in crowd with other people speaking or listening to an orchestra with a number of instruments playing concurrently. These sounds are often overlapped in time and frequency. While human attends to individual sources remarkably well under these adverse conditions even with a single ear, the performance of most speech processing system is easily degraded. Therefore, modeling how human auditory system performs is one viable way to extract target speech sources from the mixture before any vulnerable processes.;Our approach is based on the findings of psychoacoustics. To separate individual sound sources in a mixture signal, human exploits perceptual cues like harmonicity, continuity, context information and prior knowledge of familiar auditory patterns. Furthermore, the application of prior knowledge of speech for top-down separation (called schema-based grouping) is found to be powerful, yet unexplored. In this thesis, a bi-directional, model-based speech separation and enhancement algorithm is proposed by utilizing speech schemas, in particular. As model patterns are employed to generate subsequent spectral envelopes in an utterance, output speech is expected to be natural and intelligible.;The proposed separation algorithm regenerates a target speech source by working out the corresponding spectral envelope and harmonic structure. In the first stage, an optimal sequence of Wiener filtering is determined for subsequent interference removal. Specifically, acoustic models of speech schemas represented by possible line spectrum pair (LSP) patterns, are manipulated to match the input mixture and the given transcription if available, in a top-down manner. Specific LSP patterns are retrieved to constitute a spectral evolution that synchronizes with the target speech source. With this evolution, the mixture spectrum is then filtered to approximate the target source in an appropriate signal level. In the second stage, irrelevant harmonic structure from interfering sources is eliminated by comb filtering. These filters are designed according to the results of pitch tracking.;Experiments were carried out for continuous real speech mixed with either competitive speech source or broadband noise. Results show that separation outputs bear similar spectral trajectories as the ideal source signals. For speech mixtures, the proposed algorithm is evaluated in two ways: segmental signal-to-interference ratio (segSIR) and Itakura-Saito distortion ( dIS). It is found that (1) interference signal power is reduced in term of segSIR improvement, even under harsh condition of similar target speech and interference powers; and (2) dIS between the estimated source and the clean speech source is significantly smaller than before processing. These assert the capability of the proposed algorithm to extract individual sources from a mixture signal by reducing the interference signal and generating appropriate spectral trajectory for individual source estimates.
机译:本文主要研究单麦克风场景下的语音源分离问题。语音分离的可能应用包括识别,听觉假体和监视系统。声音信号通常是所需信号,其他竞争声音和背景噪声的混合物到达我们的耳朵。示例场景是与一群人交谈,其他人说话或听乐团,同时演奏多种乐器。这些声音通常在时间和频率上重叠。尽管即使在一只耳朵下,人们在这些不利条件下也能很好地照顾各个来源,但大多数语音处理系统的性能却很容易下降。因此,对人类听觉系统的行为进行建模是一种在任何易受攻击的过程之前从混合物中提取目标语音源的可行方法。;我们的方法基于心理声学的发现。为了分离混合信号中的各个声源,人类利用了诸如暗示性,连续性,上下文信息以及熟悉的听觉模式的先验知识等感知线索。此外,发现语音的先验知识用于自上而下的分离(称为基于模式的分组)的应用是强大的,但尚未探索。本文特别提出了利用语音模式的双向,基于模型的语音分离和增强算法。由于采用模型模式以话语形式生成后续频谱包络,因此预期输出的语音自然且可理解。所提出的分离算法通过计算出相应的频谱包络和谐波结构来重新生成目标语音源。在第一阶段,确定维纳滤波的最佳顺序以用于随后的干扰去除。具体地,以可能的线谱对(LSP)模式表示的语音模式的声学模型以自上而下的方式被操纵以匹配输入混合物和给定的转录(如果可用)。检索特定的LSP模式以构成与目标语音源同步的频谱演变。随着这种演变,然后将混合频谱过滤以在适当的信号电平下近似目标源。在第二阶段,通过梳状滤波消除了干扰源的无关谐波结构。这些滤波器是根据音调跟踪的结果而设计的。进行了连续真实语音与竞争性语音源或宽带噪声混合的实验。结果表明,分离输出具有与理想源信号相似的频谱轨迹。对于语音混合,以两种方式评估所提出的算法:分段信号干扰比(segSIR)和板仓赛藤失真(dIS)。发现(1)即使在目标语音和干扰功率相似的恶劣条件下,干扰信号功率也因segSIR的提高而降低; (2)估计源与纯语音源之间的dIS明显小于处理之前。这些通过减少干扰信号并为单个源估计生成适当的频谱轨迹,证明了所提出算法从混合信号中提取单个源的能力。

著录项

  • 作者

    Lee, Siu Wa.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 252 p.
  • 总页数 252
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号