Lessons from Building Acoustic Models with a Million Hours of Speech

机译：通过数百万小时的演讲建立声学模型的经验教训

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, helping scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generation, we store high valued logits from the teacher model. Introducing the notion of scheduled learning, we interleave learning on unlabeled and labeled data. To scale distributed training across a large number of GPUs, we use BMUF with 64 GPUs, while performing sequence training only on labeled data with gradient threshold compression SGD using 16 GPUs. Our experiments show that extremely large amounts of data are indeed useful; with little hyper-parameter tuning, we obtain relative WER improvements in the 10 to 20% range, with higher gains in noisier conditions.

机译：这是我们从无标签语音的100万小时建立声学模型中吸取的经验教训的报告，而有标签语音被限制为7,000小时。我们对未标记的数据进行学生/教师培训，与基于可信度模型的方法（需要解码器和可信度模型）相比，有助于扩大目标生成范围。为了优化存储并并行化目标生成，我们存储了教师模型中的高价值logit。在引入计划学习的概念时，我们在未标记和标记的数据上交错学习。为了在大量GPU上扩展分布式训练，我们将BMUF与64个GPU一起使用，而仅使用16个GPU对带有梯度阈值压缩SGD的标记数据执行序列训练。我们的实验表明，大量数据的确有用。只需很少的超参数调整，我们就可以在10％到20％的范围内获得相对的WER改善，在嘈杂的条件下可以获得更高的增益。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|6670-6674|共5页
会议地点
作者
Sree Hari Krishnan Parthasarathi; Nikko Strom;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
learning (artificial intelligence); speech recognition;

机译：学习（人工智能）;语音识别;

相似文献

外文文献
中文文献
专利

1. Building DNN acoustic models for large vocabulary speech recognition [J] . Andrew L. Maas, Peng Qi, Ziang Xie, Computer speech and language . 2017,第jana期

机译：建立用于大词汇量语音识别的DNN声学模型
2. Acoustic detection and classification of microchiroptera using machine learning: Lessons learned from automatic speech recognition [J] . Skowronski MD, Harris JG The Journal of the Acoustical Society of America . 2006,第3期

机译：使用机器学习对小手足动物进行声学检测和分类：自动语音识别的经验教训
3. Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion [J] . Mohammed Salah Al-Radhi, Tamas Gabor Csapo, Geza Nemeth Multimedia Tools and Applications . 2021,第2期

机译：文本与语音与中性语音转换中的波形发生器噪声和声学建模
4. Lessons from Building Acoustic Models with a Million Hours of Speech [C] . Sree Hari Krishnan Parthasarathi, Nikko Strom IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：建立具有一百万小时言论的声学模型的课程
5. A Computational Model of the Relationship Between Speech Intelligibility and Speech Acoustics [D] . Jiao, Yishan. 2019

机译：语音清晰度与语音声学之间关系的计算模型
6. Lessons Learned from Three Models that Use Small Grants for Building Academic-Community Partnerships for Research [O] . Michelle C. Kegler, Daniel S. Blumenthal, Tabia Henry Akintobi, -1

机译：从三种模式中吸取的经验教训这些模式使用小额赠款来建立学术-社区合作伙伴关系以进行研究
7. Building competitive direct acoustics-to-word models for English conversational speech recognition [O] . Audhkhasi, Kartik, Kingsbury, Brian, Ramabhadran, Bhuvana, 2017

机译：为英语建立有竞争力的直接声学 - 单词模型会话语音识别

Lessons from Building Acoustic Models with a Million Hours of Speech

摘要

著录项

相似文献

相关主题

期刊订阅