Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

机译：语音识别功能强大，可防止电话对话的单声道录音中的语音重叠

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Monaural (single-channel) recording is sometimes used for telephone conversations in call centers. Generally speaking, the accuracy of automatic speech recognition of a monaural recording is worse than that of the multi-channel recording of the same conversation where each speaker's voice is separately recorded. The major reason is that the recognition system fails not only at the overlapping segments where the voices of the multiple speakers overlap, but also at the neighboring segments surrounding the overlapping segments. In this paper, we tackle this problem by using a combination of garbage modeling and noise-robust monaural acoustic modeling. Our proposed method trains the models by making use of multi-channel recordings and transcripts, which are relatively easy to prepare than monaural recordings and transcripts. We present experimental results where the proposed methods reduced the error rates by approximately 3% relative to the baseline methods for both of GMM-HMM and CNN-HMM cases. Because the proposed method is quite simple, the proposed method is easy to deploy to wide range of ASR systems for monaural speech transcription.

机译：单声道（单通道）记录有时用于呼叫中心中的电话交谈。一般而言，单声道录音的自动语音识别精度要比分别记录每个讲话者语音的同一对话的多通道录音的精度差。主要原因是识别系统不仅在多个说话者的声音重叠的重叠部分处失败，而且在重叠部分周围的相邻部分处也失败。在本文中，我们通过结合使用垃圾建模和噪声健壮的单声道声学模型来解决此问题。我们提出的方法通过利用多通道录音和转录本来训练模型，这比单声道录音和转录本相对容易制备。我们介绍了实验结果，其中针对GMM-HMM和CNN-HMM案例，所提出的方法相对于基线方法将错误率降低了约3％。因为所提出的方法非常简单，所以所提出的方法易于部署到范围广泛的用于单声道语音转录的ASR系统中。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2016年|5685-5689|共5页
会议地点
作者
Masayuki Suzuki; Gakuto Kurata; Tohru Nagano; Ryuki Tachibana;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Garbage model; Monaural speech; Noise robust; Overlap; Telephone conversation;

机译：垃圾模型;语音;噪声强;重叠;电话交谈;

相似文献

外文文献
中文文献
专利

1. Monaural speech separation based on MAXVQ and CASA for robust speech recognition [J] . Peng Li, Yong Guan, Shijin Wang, Computer speech and language . 2010,第1期

机译：基于MAXVQ和CASA的单声道语音分离可增强语音识别能力
2. Robust telephone speech recognition based on channel compensation [J] . Ham Jiqing, Gao Wen Pattern Recognition: The Journal of the Pattern Recognition Society . 1999,第6期

机译：基于信道补偿的稳健电话语音识别
3. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition [J] . Biing-Hwang Juang, Rahim M.G. IEEE Transactions on Speech and Audio Proceeding . 1996,第1期

机译：通过最大似然估计消除信号偏差，以实现可靠的电话语音识别
4. Speech recognition robust against speech overlapping in monaural recordings of telephone conversations [C] . Masayuki Suzuki, Gakuto Kurata, Tohru Nagano, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：语音识别在电话交谈中的单声道记录中对语音重叠的强大
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition [O] . Sanaz Seyedin, Seyed Mohammad Ahadi, Saeed Gazor 2013

机译：使用滤波自相关序列的鲁棒MVDR频谱进行鲁棒语音识别的新功能
7. NMF based speech and music separation in monaural speech recordings with sparseness and temporal continuity constraints [O] . Tu Ming, Xie Xiang, Jiao Yishan 2013

机译：基于NMF的语音和音乐分离在单声道语音记录中，具有稀疏性和时间连续性约束

Speech recognition robust against speech overlapping in monaural recordings of telephone conversations

摘要

著录项

相似文献

相关主题

期刊订阅