End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection

机译：结束于基于CTC的语音活动检测集成的端到端自动语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper integrates a voice activity detection (VAD) function with end-to-end automatic speech recognition toward an online speech interface and transcribing very long audio recordings. We focus on connectionist temporal classification (CTC) and its extension of CTC/attention architectures. As opposed to an attention-based architecture, input-synchronous label prediction can be performed based on a greedy search with the CTC (pre-)softmax output. This prediction includes consecutive long blank labels, which can be regarded as a non-speech region. We use the labels as a cue for detecting speech segments with simple thresholding. The threshold value is directly related to the length of a non-speech region, which is more intuitive and easier to control than conventional VAD hyperparameters. Experimental results on unsegmented data show that the proposed method outperformed the baseline methods using the conventional energy-based and neural-network-based VAD methods and achieved an RTF less than 0.2. The proposed method is publicly available.

机译：本文将具有端到端自动语音识别的语音活动检测（VAD）功能集成在线语音界面并转录非常长的录音。我们专注于连接主义时间分类（CTC）及其CTC /注意力架构的扩展。与基于关注的架构相反，可以基于使用CTC（Pre-）SoftMax输出的贪婪搜索来执行输入 - 同步标签预测。该预测包括连续的长空白标签，其可以被视为非语音区域。我们使用标签作为提示，用于检测具有简单阈值化的语音段。阈值与非语音区域的长度直接相关，这比传统的Vad HyperParameter更直观，更容易控制。未分段数据的实验结果表明，该方法使用常规能量基和神经网络的VAD方法表现出基线方法，并达到小于0.2的RTF。该方法公开可用。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p6824-7443|共5页
会议地点
作者
Takenori Yoshimura; Tomoki Hayashi; Kazuya Takeda; Shinji Watanabe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Speech recognition; end-to-end; voice activity detection; streaming; CTC greedy search;

机译：语音识别;端到端;语音活动检测;流;CTC贪婪搜索;

相似文献

外文文献
中文文献
专利

1. Statistical voice activity detection based on integrated bispectrum likelihood ratio tests for robust speech recognition [J] . Ramirez J, Gorriz JM, Segura JC The Journal of the Acoustical Society of America . 2007,第5期

机译：基于集成双谱似然比测试的统计语音活动检测，可增强语音识别能力
2. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
3. Intelligent Interface Based Voice Activity Detector and Automatic Speech Recognition for Home Automation in WSN – a Survey [J] . Tharaniya soundhari.M, Brilly Sangeetha.S International Journal of Computer Trends and Technology . 2014,第1期

机译：WSN家庭自动化中基于智能接口的语音活动检测器和自动语音识别–调查
4. End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection [C] . Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：端到端自动语音识别与基于CTC的语音活动检测相集成
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech [O] . László Tóth, Ildikó Hoffmann, Gábor Gosztolya, -1

机译：基于语音识别的自发性语音自动检测轻度认知障碍的解决方案
7. Voice Activity Detection and Garbage Modelling for a Mobile Automatic Speech Recognition Application [O] . Ishaq Muhammad 2017

机译：移动自动语音识别应用程序的语音活动检测和垃圾建模

End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection

摘要

著录项

相似文献

相关主题

期刊订阅