首页> 外文学位 >Model-based speech separation and enhancement with single -microphone input.

【24h】

Model-based speech separation and enhancement with single -microphone input.

机译：具有单麦克风输入的基于模型的语音分离和增强。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis focuses on speech source separation problem in a single-microphone scenario. Possible applications of speech separation include recognition, auditory prostheses and surveillance systems. Sound signals typically reach our ears as a mixture of desired signals, other competing sounds and background noise. Example scenarios are talking with someone in crowd with other people speaking or listening to an orchestra with a number of instruments playing concurrently. These sounds are often overlapped in time and frequency. While human attends to individual sources remarkably well under these adverse conditions even with a single ear, the performance of most speech processing system is easily degraded. Therefore, modeling how human auditory system performs is one viable way to extract target speech sources from the mixture before any vulnerable processes.;Our approach is based on the findings of psychoacoustics. To separate individual sound sources in a mixture signal, human exploits perceptual cues like harmonicity, continuity, context information and prior knowledge of familiar auditory patterns. Furthermore, the application of prior knowledge of speech for top-down separation (called schema-based grouping) is found to be powerful, yet unexplored. In this thesis, a bi-directional, model-based speech separation and enhancement algorithm is proposed by utilizing speech schemas, in particular. As model patterns are employed to generate subsequent spectral envelopes in an utterance, output speech is expected to be natural and intelligible.;The proposed separation algorithm regenerates a target speech source by working out the corresponding spectral envelope and harmonic structure. In the first stage, an optimal sequence of Wiener filtering is determined for subsequent interference removal. Specifically, acoustic models of speech schemas represented by possible line spectrum pair (LSP) patterns, are manipulated to match the input mixture and the given transcription if available, in a top-down manner. Specific LSP patterns are retrieved to constitute a spectral evolution that synchronizes with the target speech source. With this evolution, the mixture spectrum is then filtered to approximate the target source in an appropriate signal level. In the second stage, irrelevant harmonic structure from interfering sources is eliminated by comb filtering. These filters are designed according to the results of pitch tracking.;Experiments were carried out for continuous real speech mixed with either competitive speech source or broadband noise. Results show that separation outputs bear similar spectral trajectories as the ideal source signals. For speech mixtures, the proposed algorithm is evaluated in two ways: segmental signal-to-interference ratio (segSIR) and Itakura-Saito distortion ( dIS). It is found that (1) interference signal power is reduced in term of segSIR improvement, even under harsh condition of similar target speech and interference powers; and (2) dIS between the estimated source and the clean speech source is significantly smaller than before processing. These assert the capability of the proposed algorithm to extract individual sources from a mixture signal by reducing the interference signal and generating appropriate spectral trajectory for individual source estimates.

机译：本文主要研究单麦克风场景下的语音源分离问题。语音分离的可能应用包括识别，听觉假体和监视系统。声音信号通常是所需信号，其他竞争声音和背景噪声的混合物到达我们的耳朵。示例场景是与一群人交谈，其他人说话或听乐团，同时演奏多种乐器。这些声音通常在时间和频率上重叠。尽管即使在一只耳朵下，人们在这些不利条件下也能很好地照顾各个来源，但大多数语音处理系统的性能却很容易下降。因此，对人类听觉系统的行为进行建模是一种在任何易受攻击的过程之前从混合物中提取目标语音源的可行方法。;我们的方法基于心理声学的发现。为了分离混合信号中的各个声源，人类利用了诸如暗示性，连续性，上下文信息以及熟悉的听觉模式的先验知识等感知线索。此外，发现语音的先验知识用于自上而下的分离（称为基于模式的分组）的应用是强大的，但尚未探索。本文特别提出了利用语音模式的双向，基于模型的语音分离和增强算法。由于采用模型模式以话语形式生成后续频谱包络，因此预期输出的语音自然且可理解。所提出的分离算法通过计算出相应的频谱包络和谐波结构来重新生成目标语音源。在第一阶段，确定维纳滤波的最佳顺序以用于随后的干扰去除。具体地，以可能的线谱对（LSP）模式表示的语音模式的声学模型以自上而下的方式被操纵以匹配输入混合物和给定的转录（如果可用）。检索特定的LSP模式以构成与目标语音源同步的频谱演变。随着这种演变，然后将混合频谱过滤以在适当的信号电平下近似目标源。在第二阶段，通过梳状滤波消除了干扰源的无关谐波结构。这些滤波器是根据音调跟踪的结果而设计的。进行了连续真实语音与竞争性语音源或宽带噪声混合的实验。结果表明，分离输出具有与理想源信号相似的频谱轨迹。对于语音混合，以两种方式评估所提出的算法：分段信号干扰比（segSIR）和板仓赛藤失真（dIS）。发现（1）即使在目标语音和干扰功率相似的恶劣条件下，干扰信号功率也因segSIR的提高而降低；（2）估计源与纯语音源之间的dIS明显小于处理之前。这些通过减少干扰信号并为单个源估计生成适当的频谱轨迹，证明了所提出算法从混合信号中提取单个源的能力。

著录项

作者
Lee, Siu Wa.;
展开▼
作者单位

The Chinese University of Hong Kong (Hong Kong).;

展开▼
授予单位 The Chinese University of Hong Kong (Hong Kong).;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2008
页码 252 p.
总页数 252
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Single Channel Speech Separation Using an Efficient Model-based Method [J] . Sonay Kammi, Mohammad Reza Karami International Journal of Information Technology and Computer Science . 2015,第3期

机译：使用基于模型的有效方法进行单通道语音分离
2. Speaker-independent Model-based Single Channel Speech Separation [J] . M.H. Radfar, R.M. Dansereau, A. Sayadiyan Neurocomputing . 2008,第1a3期

机译：基于说话者独立模型的单通道语音分离
3. An Individualized Super-Gaussian Single Microphone Speech Enhancement for Hearing Aid Users With Smartphone as an Assistive Device [J] . Chandan Karadagur Ananda Reddy, Nikhil Shankar, Gautam Shreedhar Bhat, IEEE signal processing letters . 2017,第11期

机译：以智能手机为辅助设备的助听器用户的个性化超高斯单麦克风语音增强
4. Integrating multiple observations for model-based single-microphone speech separation with conditional random fields [C] . Yeung, Yu Ting IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP . 2012

机译：集成多个观察值以用于基于模型的单麦克风语音分离和条件随机场
5. Model-based Single-microphone Speech Separation Using Conditional Random Fields. [D] . Yeung, Yu Ting. 2014

机译：使用条件随机场的基于模型的单麦克风语音分离。
6. An individualized super-Gaussian single microphone Speech Enhancement for hearing aid users with smartphone as an assistive device [O] . Chandan K A Reddy, Nikhil Shankar, Gautam S Bhat, -1

机译：使用智能手机作为辅助设备的助听器用户的个性化超高斯单麦克风语音增强
7. INTEGRATING MULTIPLE OBSERVATIONS FOR MODEL-BASED SINGLE-MICROPHONE SPEECH SEPARATION WITH CONDITIONAL RANDOM FIELDS [O] . Yu Ting Yeung, Tan Lee, Cheung-chi Leung 2015

机译：利用条件随机场整合基于模型的单麦克风语音分离的多个观测

Model-based speech separation and enhancement with single -microphone input.

摘要

著录项

相似文献

相关主题

期刊订阅