【24h】

The Sheffield Wargame Corpus - Day Two and Day Three

机译:谢菲尔德战争语料库 - 第二天和三天三天

获取原文

摘要

Improving the performance of distant speech recognition is of considerable current interest, driven by a desire to bring speech recognition into people's homes. Standard approaches to this task aim to enhance the signal prior to recognition, typically using beamforming techniques on multiple channels. Only few real-world recordings are available that allow experimentation with such techniques. This has become even more pertinent with recent works with deep neural networks aiming to learn beamforming from data. Such approaches require large multi-channel training sets, ideally with location annotation for moving speakers, which is scarce in existing corpora. This paper presents a freely available and new extended corpus of English speech recordings in a natural setting, with moving speakers. The data is recorded with diverse microphone arrays, and uniquely, with ground truth location tracking. It extends the 8.0 hour Sheffield Wargames Corpus released in Interspeech 2013, with a further 16.6 hours of fully annotated data, including 6.1 hours of female speech to improve gender bias. Additional blog-based language model data is provided alongside, as well as a Kaldi baseline system. Results are reported with a standard Kaldi configuration, and a baseline meeting recognition system.
机译:提高遥远的语音识别的性能具有相当大的目前的兴趣,其渴望将语音识别带入人们的家园。该任务的标准方法旨在在识别之前增强信号,通常在多个通道上使用波束成形技术。只有很少的真实录音,允许通过这些技术进行实验。对于近期具有从数据学习波束成形的深度神经网络,这与最近的内部网络有关,这变得更加相关。这种方法需要大型多通道训练集,理想情况下,具有移动扬声器的位置注释,在现有的语料库中稀缺。本文在自然的环境中展示了一个可自由的和新的延伸语料库,具有移动扬声器。数据以各种麦克风阵列录制,唯一,具有地面真理位置跟踪。它扩展了8.0小时的Sheffield Wargames语料库,在Interspeech 2013中发布,另一个16.6小时的完全注释数据,包括6.1小时的女性演讲,以改善性别偏见。伴随的附加博客语言模型数据以及Kaldi基线系统提供。结果报告了标准的KALDI配置和基线会议识别系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号