Speech/Nonspeech Segmentation in Web Videos

机译：Web视频中的语音/非静音分段

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.

机译：Web视频的语音转录需要首先检测具有可转录的语音的段。我们将此称为细分。常用的分割技术对于YouTube等域来说是不充分的，其中视频可能具有各种各样的背景和记录条件。在这项工作中，我们调查了替代音频特征和判别分类器，与临时使用的高斯混合模型相比，在临时特征（30.6％）上的常用的高斯混合模型相比，在YouTube视频中共同产生较低的帧错误率（25.3％）。替代音频功能在嘈杂的条件下表现尤其良好。

著录项

来源
《INTERSPEECH 2012》|2012年||共4页
会议地点
作者
Ananya Misra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 73.4136083;
关键词
segmentation; speech detection; voice activity detection; video;

机译：分割;语音检测;语音活动检测;视频;

相似文献

外文文献
中文文献
专利

1. Visual phonetic processing localized using speech and nonspeech face gestures in video and point-light displays. [J] . Bernstein LE, Jiang J, Pantazis D, Human brain mapping . 2011,第10期

机译：视频和点光源显示中使用语音和非语音面部手势对可视语音处理进行了本地化。
2. Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment [J] . Chiu C.-Y. Circuits and Systems for Video Technology, IEEE Transactions on . 2012,第7期

机译：通过视频分割和文本对齐标记棒球视频中的网络广播文本
3. Auditory event-related potentials index faster processing of natural speech but not synthetic speech over nonspeech analogs in children [J] . Whitten Allison, Key Alexandra P., Mefferd Antje S., Brain and language . 2020,第1期

机译：检测事件相关的潜力指数更快地处理自然语音，而不是儿童非宾诵类似物的合成讲话
4. Speech/Nonspeech Segmentation in Web Videos [C] . Ananya Misra Annual conference of the International Speech Communication Association . 2012

机译：网络视频中的语音/非语音分割
5. Speech and Nonspeech Production I the Absence of the Vocal Tract [D] . Thompson, Megan. 2018

机译：言语和非言语产生我没有声带
6. Modulation of Auditory Responses to Speech vs. Nonspeech Stimuli during Speech Movement Planning [O] . Ayoub Daliri, Ludo Max 2016

机译：语音和听觉响应的调制。言语运动计划中的非言语刺激
7. Modulation of Auditory Responses to Speech vs. Nonspeech Stimuli during Speech Movement Planning [O] . Ayoub Daliri, Ludo Max 2016

机译：语音运动规划期间对言语与非语言刺激的听觉反应的调节

Speech/Nonspeech Segmentation in Web Videos

摘要

著录项

相似文献

相关主题

期刊订阅