Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition

Zhehuai Chen; Jasha Droppo; Jinyu Li; Wayne Xiong

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition

【24h】

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition

机译：无监督单通道重叠语音识别中的渐进联合建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three subtasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of word error rate over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency, and the sequence level linguistic knowledge integration.

机译：无监督单通道重叠语音识别是自动语音识别（ASR）中最困难的问题之一。置换不变训练（PIT）是一种基于模型的先进方法，该方法应用单个神经网络来解决此单输入多输出建模问题。我们建议通过在神经网络上强加模块化结构，应用渐进式预训练方案并通过转移学习和判别性训练准则来改善目标功能，来提高当前的技术水平。模块化结构将问题分为三个子任务：逐帧解释，话语级说话者跟踪和语音识别。预培训方案使用这些模块来解决日益艰巨的任务。转移学习利用并行清晰的语音来改善网络的培训目标。我们的判别式培训公式是对标准公式的修改，它还会惩罚系统的竞争输出。实验是在人工重叠的总机和hub5e-swb数据集上进行的。所提出的框架在强大的联合训练系统AIT的PIT和单独优化的系统PIT用于干净语音ASR模型的语音分离中，均实现了30％以上的字误率相对改善。改进来自更好的模型概括，训练效率和序列级语言知识的集成。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2018年第1期|184-196|共13页
作者
Zhehuai Chen; Jasha Droppo; Jinyu Li; Wayne Xiong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech recognition; Speech; Training; Acoustics; Speech processing; Neural networks;

机译：语音识别;语音;训练;声学;语音处理;神经网络;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition [J] . Zhen Huang, Sabato Marco Siniscalchi, Chin-Hui Lee Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第1期

机译：用于语音识别的深度模型中激活函数参数的贝叶斯无监督批处理和在线说话者自适应
3. Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition [J] . Tetsuo KOSAKA, Yuui TAKEDA, Takashi ITO, IEICE transactions on information and systems . 2010,第9期

机译：使用演讲者级模型的演讲者语音识别的无监督演讲者自适应
4. SEQUENCE MODELING IN UNSUPERVISED SINGLE-CHANNEL OVERLAPPED SPEECH RECOGNITION [C] . Zhehuai Chen, Jasha Droppo IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：无监督单通道重叠语音识别中的序列建模
5. Combining Speech and Speaker Recognition: A Joint Modeling Approach [D] . Su, Hang. 2018

机译：语音和说话者识别相结合：一种联合建模方法
6. Unsupervised Adaptation of Categorical Prosody Models for Prosody Labeling and Speech Recognition [O] . Sankaranarayanan Ananthakrishnan, Shrikanth Narayanan -1

机译：类别韵律模型的无监督适应用于韵律标记和语音识别
7. Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition [O] . Chen, Zhehuai, Droppo, Jasha, Li, Jinyu, 2017

机译：无监督单通道重叠的渐进联合建模语音识别

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅