首页> 外文期刊>Speech Communication >Exploiting correlogram structure for robust speech recognition with multiple speech sources
【24h】

Exploiting correlogram structure for robust speech recognition with multiple speech sources

机译:利用相关图结构实现多种语音源的鲁棒语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a 'speech fragment decoder' which employs 'missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy.
机译:本文探讨了在存在竞争性语音源的情况下,在单声道混音中分离和识别语音的问题。所提出的系统将声源分离和语音识别视为紧密耦合的过程。在第一阶段,在相关图域中执行声源分离。对于周期性声音,相关图显示出对称的树状结构,其茎位于对应多个音高周期的延迟上。在研究中利用这些与音高相关的结构对每个时间帧的频谱分量进行分组。然后针对每个频谱组计算局部音调估计,并将其用于形成同时音调轨道以进行时间积分。这些过程将声学混合物的频谱表示分离为几个时频区域,以使每个区域中的能量很可能源自单个周期性声源。所识别的时频区域与频谱表示一起被“语音片段解码器”使用,该语音片段解码器采用具有清晰语音模型的“缺失数据”技术来同时搜索与模型序列最匹配的声学证据。本文提出了基于人工混合同时发声的评估。首先报道了相干性测量实验,该实验量化了所鉴定片段与单一来源的一致性。然后在语音识别任务中评估该系统,并将其与常规片段生成方法进行比较。结果表明,所提出的系统在不同条件下会产生更多的连贯片段,从而显着提高识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号