首页> 外文会议>Spoken Language Technology Workshop >Towards Unsupervised Learning of Speech Features in the Wild
【24h】

Towards Unsupervised Learning of Speech Features in the Wild

机译:在野外的言语中的言语功能无监督

获取原文

摘要

Recent work on unsupervised contrastive learning of speech representation has shown promising results, but so far has mostly been applied to clean, curated speech datasets. Can it also be used with unprepared audio data "in the wild"? Here, we explore three potential problems in this setting: (i) presence of non-speech data, (ii) noisy or low quality speech data, and (iii) imbalance in speaker distribution. We show that on the Libri-light train set, which is itself a relatively clean speech-only dataset, these problems combined can already have a performance cost of up to 30% relative for the ABX score. We show that the first two problems can be alleviated by data filtering, with voice activity detection selecting speech segments, while perplexity of a model trained with clean data helping to discard entire files. We show that the third problem can be alleviated by learning a speaker embedding in the predictive branch of the model. We show that these techniques build more robust speech features that can be transferred to an ASR task in the low resource setting.
机译:最近关于言语表示的无监督对比学习的工作表明了有希望的结果,但到目前为止主要应用于清洁,策划的语音数据集。它也可以与毫无准备的音频数据“在野外”?在这里,我们在此设置中探讨了三个潜在问题:(i)非语音数据的存在,(ii)噪声或低质量语音数据,(iii)在扬声器分发中的不平衡。我们表明,在Libli-Light列车集上,它本身就是一个相对清晰的语音数据集,这些问题可以组合的性能成本高达30%相对于ABX得分。我们表明,数据过滤可以缓解前两个问题,语音活动检测选择语音段,而用清洁数据训练的模型的困惑有助于丢弃整个文件。我们展示了第三个问题可以通过在模型的预测分支中学习嵌入扬声器嵌入来缓解。我们表明这些技术构建了更强大的语音功能,可以将可以传输到低资源设置中的ASR任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号