【24h】

Speech/Nonspeech Segmentation in Web Videos

机译:网络视频中的语音/非语音分割

获取原文

摘要

Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as YouTube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on YouTube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.
机译:网络视频的语音转录需要首先检测具有可转录语音的片段。我们将此称为细分。常用的细分技术不足以用于YouTube之类的域,在YouTube中,视频可能具有各种各样的背景和录制条件。在这项工作中,我们研究了替代音频功能和判别式分类器,与在倒谱特征上训练的常用高斯混合模型(30.6%)相比,它们在YouTube视频上产生的帧错误率更低(25.3%)。替代音频功能在嘈杂的条件下表现特别出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号