首页> 外文期刊>Computer speech and language >A computational auditory scene analysis system for speech segregation and robust speech recognition
【24h】

A computational auditory scene analysis system for speech segregation and robust speech recognition

机译:用于语音分离和鲁棒语音识别的计算听觉场景分析系统

获取原文
获取原文并翻译 | 示例

摘要

A conventional automatic speech recognizer does not perform well in the presence of multiple sound sources, while human listeners are able to segregate and recognize a signal of interest through auditory scene analysis. We present a computational auditory scene analysis system for separating and recognizing target speech in the presence of competing speech or noise. We estimate, in two stages, the ideal binary time-frequency (T F) mask which retains the mixture in a local T-F unit if and only if the target is stronger than the interference within the unit. In the first stage, we use harmonicity to segregate the voiced portions of individual sources in each time frame based on multipitch tracking. Additionally, unvoiced portions are segmented based on an onset/offset analysis. In the second stage, speaker characteristics are used to group the T F units across time frames. The resulting masks are used in an uncertainty decoding framework for automatic speech recognition. We evaluate our system on a speech separation challenge and show that our system yields substantial improvement over the baseline performance.
机译:常规的自动语音识别器在存在多个声源的情况下表现不佳,而人类听众则能够通过听觉场景分析来分离和识别感兴趣的信号。我们提出了一种计算听觉场景分析系统,用于在存在竞争性语音或噪声的情况下分离和识别目标语音。我们分两个阶段估算理想的二进制时频(TF)掩模,当且仅当目标比该单位内的干扰强时,该掩模才能将混合气保留在本地T-F单元中。在第一阶段,我们使用谐波根据多音高跟踪将每个时间帧中各个声源的浊音部分隔离开来。另外,基于开始/偏移分析对清音部分进行分割。在第二阶段,说话者特征被用于跨时间帧对TF单元进行分组。所得的掩码在不确定性解码框架中用于自动语音识别。我们针对语音分离挑战对我们的系统进行了评估,结果表明我们的系统在基准性能方面取得了显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号