首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition
【24h】

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition

机译:无监督单通道重叠语音识别中的渐进联合建模

获取原文
获取原文并翻译 | 示例

摘要

Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural network to solve this single-input, multiple-output modeling problem. We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion. The modular structure splits the problem into three subtasks: frame-wise interpreting, utterance-level speaker tracing, and speech recognition. The pretraining regimen uses these modules to solve progressively harder tasks. Transfer learning leverages parallel clean speech to improve the training targets for the network. Our discriminative training formulation is a modification of standard formulations that also penalizes competing outputs of the system. Experiments are conducted on the artificial overlapped switchboard and hub5e-swb dataset. The proposed framework achieves over 30% relative improvement of word error rate over both a strong jointly trained system, PIT for ASR, and a separately optimized system, PIT for speech separation with clean speech ASR model. The improvement comes from better model generalization, training efficiency, and the sequence level linguistic knowledge integration.
机译:无监督单通道重叠语音识别是自动语音识别(ASR)中最困难的问题之一。置换不变训练(PIT)是一种基于模型的先进方法,该方法应用单个神经网络来解决此单输入多输出建模问题。我们建议通过在神经网络上强加模块化结构,应用渐进式预训练方案并通过转移学习和判别性训练准则来改善目标功能,来提高当前的技术水平。模块化结构将问题分为三个子任务:逐帧解释,话语级说话者跟踪和语音识别。预培训方案使用这些模块来解决日益艰巨的任务。转移学习利用并行清晰的语音来改善网络的培训目标。我们的判别式培训公式是对标准公式的修改,它还会惩罚系统的竞争输出。实验是在人工重叠的总机和hub5e-swb数据集上进行的。所提出的框架在强大的联合训练系统AIT的PIT和单独优化的系统PIT用于干净语音ASR模型的语音分离中,均实现了30%以上的字误率相对改善。改进来自更好的模型概括,训练效率和序列级语言知识的集成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号