Exploiting correlogram structure for robust speech recognition with multiple speech sources

Ning Ma; Phil Green; Jon Barker; Andre Coy

首页> 外文期刊>Speech Communication >Exploiting correlogram structure for robust speech recognition with multiple speech sources

【24h】

Exploiting correlogram structure for robust speech recognition with multiple speech sources

机译：利用相关图结构实现多种语音源的鲁棒语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper addresses the problem of separating and recognising speech in a monaural acoustic mixture with the presence of competing speech sources. The proposed system treats sound source separation and speech recognition as tightly coupled processes. In the first stage sound source separation is performed in the correlogram domain. For periodic sounds, the correlogram exhibits symmetric tree-like structures whose stems are located on the delay that corresponds to multiple pitch periods. These pitch-related structures are exploited in the study to group spectral components at each time frame. Local pitch estimates are then computed for each spectral group and are used to form simultaneous pitch tracks for temporal integration. These processes segregate a spectral representation of the acoustic mixture into several time-frequency regions such that the energy in each region is likely to have originated from a single periodic sound source. The identified time-frequency regions, together with the spectral representation, are employed by a 'speech fragment decoder' which employs 'missing data' techniques with clean speech models to simultaneously search for the acoustic evidence that best matches model sequences. The paper presents evaluations based on artificially mixed simultaneous speech utterances. A coherence-measuring experiment is first reported which quantifies the consistency of the identified fragments with a single source. The system is then evaluated in a speech recognition task and compared to a conventional fragment generation approach. Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy.

机译：本文探讨了在存在竞争性语音源的情况下，在单声道混音中分离和识别语音的问题。所提出的系统将声源分离和语音识别视为紧密耦合的过程。在第一阶段，在相关图域中执行声源分离。对于周期性声音，相关图显示出对称的树状结构，其茎位于对应多个音高周期的延迟上。在研究中利用这些与音高相关的结构对每个时间帧的频谱分量进行分组。然后针对每个频谱组计算局部音调估计，并将其用于形成同时音调轨道以进行时间积分。这些过程将声学混合物的频谱表示分离为几个时频区域，以使每个区域中的能量很可能源自单个周期性声源。所识别的时频区域与频谱表示一起被“语音片段解码器”使用，该语音片段解码器采用具有清晰语音模型的“缺失数据”技术来同时搜索与模型序列最匹配的声学证据。本文提出了基于人工混合同时发声的评估。首先报道了相干性测量实验，该实验量化了所鉴定片段与单一来源的一致性。然后在语音识别任务中评估该系统，并将其与常规片段生成方法进行比较。结果表明，所提出的系统在不同条件下会产生更多的连贯片段，从而显着提高识别精度。

著录项

来源
《Speech Communication》 |2007年第12期|874-891|共18页
作者
Ning Ma; Phil Green; Jon Barker; Andre Coy;
展开▼
作者单位

Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类语言、文字;
关键词
speech separation; robust speech recognition; multiple pitch tracking; computational auditory scene analysis; correlogram; speech fragment decoding;

机译：语音分离强大的语音识别;多音高跟踪计算听觉场景分析;相关图语音片段解码;

相似文献

外文文献
中文文献
专利

1. Exploiting Temporal Correlation of Speech for Error Robust and Bandwidth Flexible Distributed Speech Recognition [J] . Zheng-Hua Tan, Dalsgaard P., Lindberg B. IEEE transactions on audio, speech and language processing . 2007,第4期

机译：利用语音的时间相关性以实现鲁棒性和带宽灵活的分布式语音识别
2. Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources [J] . McLaren M., van Leeuwen D. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第3期

机译：源归一化LDA，用于使用来自多个语音源的i矢量进行鲁棒的说话人识别
3. Temporal Structure Normalization of Speech Feature for Robust Speech Recognition [J] . Xiao X., Chng E. S., Li H. IEEE signal processing letters . 2007,第7期

机译：语音特征的时态结构归一化，用于鲁棒语音识别
4. Robust speech recognition using multiple prior models for speech reconstruction [C] . Narayanan Arun, Zhao Xiaojia, Wang DeLiang, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing . 2011

机译：使用多个先验模型进行语音重建的稳健语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Speech recognition for multiple bands: Implications for the Speech Intelligibility Index [O] . Larry E. Humes, Gary R. Kidd -1

机译：多个频段的语音识别：对语音清晰度指数的影响
7. Exploiting correlogram structure for robust speech recognition with multiple speech sources [O] . Ma, Ning, Green, Phil, Barker, Jon, 2007

机译：利用相关图结构实现多种语音源的鲁棒语音识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Exploiting correlogram structure for robust speech recognition with multiple speech sources

摘要

著录项

相似文献

相关主题

期刊订阅