Speech/Nonspeech Segmentation in Web Videos

机译：网络视频中的语音/非语音分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.

机译：网络视频的语音转录需要首先检测具有可转录语音的片段。我们将此称为细分。常用的细分技术不足以用于YouTube之类的域，在YouTube中，视频可能具有各种各样的背景和录制条件。在这项工作中，我们研究了替代音频功能和判别式分类器，与在倒谱特征上训练的常用高斯混合模型（30.6％）相比，它们在YouTube视频上产生的帧错误率更低（25.3％）。替代音频功能在嘈杂的条件下表现特别出色。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|1975-1978|共4页
会议地点
作者
Ananya Misra;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
segmentation; speech detection; voice activity detection; video;

机译：分割;语音检测;语音活动检测;视频;

相似文献

外文文献
中文文献
专利

1. Visual phonetic processing localized using speech and nonspeech face gestures in video and point-light displays. [J] . Bernstein LE, Jiang J, Pantazis D, Human brain mapping . 2011,第10期

机译：视频和点光源显示中使用语音和非语音面部手势对可视语音处理进行了本地化。
2. Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment [J] . Chiu C.-Y. Circuits and Systems for Video Technology, IEEE Transactions on . 2012,第7期

机译：通过视频分割和文本对齐标记棒球视频中的网络广播文本
3. Auditory event-related potentials index faster processing of natural speech but not synthetic speech over nonspeech analogs in children [J] . Whitten Allison, Key Alexandra P., Mefferd Antje S., Brain and language . 2020,第1期

机译：检测事件相关的潜力指数更快地处理自然语音，而不是儿童非宾诵类似物的合成讲话
4. Speech/Nonspeech Segmentation in Web Videos [C] . Ananya Misra INTERSPEECH 2012 . 2012

机译：Web视频中的语音/非静音分段
5. Speech and Nonspeech Production I the Absence of the Vocal Tract [D] . Thompson, Megan. 2018

机译：言语和非言语产生我没有声带
6. Modulation of Auditory Responses to Speech vs. Nonspeech Stimuli during Speech Movement Planning [O] . Ayoub Daliri, Ludo Max 2016

机译：语音和听觉响应的调制。言语运动计划中的非言语刺激
7. Modulation of Auditory Responses to Speech vs. Nonspeech Stimuli during Speech Movement Planning [O] . Ayoub Daliri, Ludo Max 2016

机译：语音运动规划期间对言语与非语言刺激的听觉反应的调节

Speech/Nonspeech Segmentation in Web Videos

摘要

著录项

相似文献

相关主题

期刊订阅