首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection
【24h】

End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection

机译:结束于基于CTC的语音活动检测集成的端到端自动语音识别

获取原文

摘要

This paper integrates a voice activity detection (VAD) function with end-to-end automatic speech recognition toward an online speech interface and transcribing very long audio recordings. We focus on connectionist temporal classification (CTC) and its extension of CTC/attention architectures. As opposed to an attention-based architecture, input-synchronous label prediction can be performed based on a greedy search with the CTC (pre-)softmax output. This prediction includes consecutive long blank labels, which can be regarded as a non-speech region. We use the labels as a cue for detecting speech segments with simple thresholding. The threshold value is directly related to the length of a non-speech region, which is more intuitive and easier to control than conventional VAD hyperparameters. Experimental results on unsegmented data show that the proposed method outperformed the baseline methods using the conventional energy-based and neural-network-based VAD methods and achieved an RTF less than 0.2. The proposed method is publicly available.
机译:本文将具有端到端自动语音识别的语音活动检测(VAD)功能集成在线语音界面并转录非常长的录音。我们专注于连接主义时间分类(CTC)及其CTC /注意力架构的扩展。与基于关注的架构相反,可以基于使用CTC(Pre-)SoftMax输出的贪婪搜索来执行输入 - 同步标签预测。该预测包括连续的长空白标签,其可以被视为非语音区域。我们使用标签作为提示,用于检测具有简单阈值化的语音段。阈值与非语音区域的长度直接相关,这比传统的Vad HyperParameter更直观,更容易控制。未分段数据的实验结果表明,该方法使用常规能量基和神经网络的VAD方法表现出基线方法,并达到小于0.2的RTF。该方法公开可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号