Towards Unsupervised Learning of Speech Features in the Wild

机译：在野外的言语中的言语功能无监督

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent work on unsupervised contrastive learning of speech representation has shown promising results, but so far has mostly been applied to clean, curated speech datasets. Can it also be used with unprepared audio data "in the wild"? Here, we explore three potential problems in this setting: (i) presence of non-speech data, (ii) noisy or low quality speech data, and (iii) imbalance in speaker distribution. We show that on the Libri-light train set, which is itself a relatively clean speech-only dataset, these problems combined can already have a performance cost of up to 30% relative for the ABX score. We show that the first two problems can be alleviated by data filtering, with voice activity detection selecting speech segments, while perplexity of a model trained with clean data helping to discard entire files. We show that the third problem can be alleviated by learning a speaker embedding in the predictive branch of the model. We show that these techniques build more robust speech features that can be transferred to an ASR task in the low resource setting.

机译：最近关于言语表示的无监督对比学习的工作表明了有希望的结果，但到目前为止主要应用于清洁，策划的语音数据集。它也可以与毫无准备的音频数据“在野外”？在这里，我们在此设置中探讨了三个潜在问题：（i）非语音数据的存在，（ii）噪声或低质量语音数据，（iii）在扬声器分发中的不平衡。我们表明，在Libli-Light列车集上，它本身就是一个相对清晰的语音数据集，这些问题可以组合的性能成本高达30％相对于ABX得分。我们表明，数据过滤可以缓解前两个问题，语音活动检测选择语音段，而用清洁数据训练的模型的困惑有助于丢弃整个文件。我们展示了第三个问题可以通过在模型的预测分支中学习嵌入扬声器嵌入来缓解。我们表明这些技术构建了更强大的语音功能，可以将可以传输到低资源设置中的ASR任务。

著录项

来源
《Spoken Language Technology Workshop》|2021年|156-163|共8页
会议地点
作者
Morgane Rivière; Emmanuel Dupoux;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
speech recognition; unsupervised representation learning; contrastive predictive coding; data filtering; speaker adaptation;

机译：语音识别;无监督的代表学习;对比预测编码;数据过滤;扬声器适应;

相似文献

外文文献
中文文献
专利

1. Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection [J] . Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第10期

机译：语音特征和声学令牌的无监督迭代深度学习及其在口语检测中的应用
2. Speech emotion recognition with unsupervised feature learning [J] . Zheng-wei?Huang, Wen-tao?Xue, Qi-rong?Mao Frontiers of Information Technology & Electronic Engineering . 2015,第5期

机译：语音情感识别与无监督特征学习
3. Speech emotion recognition with unsupervised feature learning [J] . Zheng-wei HUANG, Wen-tao XUE, Qi-rong MAO 浙江大学学报（英文版）（C辑：计算机与电子） . 2015,第005期

机译：语音情感识别与无监督特征学习
4. UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION [C] . Sefik Emre Eskimez, Zhiyao Duan, Wendi Heinzelman IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：自动语音情感识别特征分析的无监督学习方法
5. Learning Features for Unsupervised Learning and Reinforcement Learning [D] . Song, Zhao. 2018

机译：无监督学习和强化学习的学习功能
6. Unsupervised learning of vowel categories from infant-directed speech [O] . Gautam K. Vallabha, James L. McClelland, Ferran Pons, 2007

机译：从婴儿指导的语音中无监督学习元音类别
7. Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks [O] . Petri-Johan Last, Herman A. Engelbrecht, Herman Kamper 2020

机译：使用通信和暹罗网络进行言语的无监督特征

Towards Unsupervised Learning of Speech Features in the Wild

摘要

著录项

相似文献

相关主题

期刊订阅