首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Lessons from Building Acoustic Models with a Million Hours of Speech
【24h】

Lessons from Building Acoustic Models with a Million Hours of Speech

机译:通过数百万小时的演讲建立声学模型的经验教训

获取原文

摘要

This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, helping scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generation, we store high valued logits from the teacher model. Introducing the notion of scheduled learning, we interleave learning on unlabeled and labeled data. To scale distributed training across a large number of GPUs, we use BMUF with 64 GPUs, while performing sequence training only on labeled data with gradient threshold compression SGD using 16 GPUs. Our experiments show that extremely large amounts of data are indeed useful; with little hyper-parameter tuning, we obtain relative WER improvements in the 10 to 20% range, with higher gains in noisier conditions.
机译:这是我们从无标签语音的100万小时建立声学模型中吸取的经验教训的报告,而有标签语音被限制为7,000小时。我们对未标记的数据进行学生/教师培训,与基于可信度模型的方法(需要解码器和可信度模型)相比,有助于扩大目标生成范围。为了优化存储并并行化目标生成,我们存储了教师模型中的高价值logit。在引入计划学习的概念时,我们在未标记和标记的数据上交错学习。为了在大量GPU上扩展分布式训练,我们将BMUF与64个GPU一起使用,而仅使用16个GPU对带有梯度阈值压缩SGD的标记数据执行序列训练。我们的实验表明,大量数据的确有用。只需很少的超参数调整,我们就可以在10%到20%的范围内获得相对的WER改善,在嘈杂的条件下可以获得更高的增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号