首页> 外文OA文献 >Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data
【2h】

Personalized Acoustic Modeling by Weakly Supervised Multi-Task Deep Learning using Acoustic Tokens Discovered from Unlabeled Data

机译:弱监督多任务深度的个性化声学建模   使用未标记数据发现的声学令牌进行学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

It is well known that recognizers personalized to each user are much moreeffective than user-independent recognizers. With the popularity of smartphonestoday, although it is not difficult to collect a large set of audio data foreach user, it is difficult to transcribe it. However, it is now possible toautomatically discover acoustic tokens from unlabeled personal data in anunsupervised way. We therefore propose a multi-task deep learning frameworkcalled a phoneme-token deep neural network (PTDNN), jointly trained fromunsupervised acoustic tokens discovered from unlabeled data and very limitedtranscribed data for personalized acoustic modeling. We term this scenario"weakly supervised". The underlying intuition is that the high degree ofsimilarity between the HMM states of acoustic token models and phoneme modelsmay help them learn from each other in this multi-task learning framework.Initial experiments performed over a personalized audio data set recorded fromFacebook posts demonstrated that very good improvements can be achieved in bothframe accuracy and word accuracy over popularly-considered baselines such asfDLR, speaker code and lightly supervised adaptation. This approach complementsexisting speaker adaptation approaches and can be used jointly with suchtechniques to yield improved results.
机译:众所周知,为每个用户个性化的识别器比独立于用户的识别器要有效得多。随着当今智能手机的普及,尽管为每个用户收集大量音频数据并不困难,但是很难转录。但是,现在有可能以不受监督的方式自动从未标记的个人数据中发现声学令牌。因此,我们提出了一种称为音素深层神经网络(PTDNN)的多任务深度学习框架,该框架由从无标签数据和非常有限的转录数据中发现的无监督声学标记联合训练而成,用于个性化声学建模。我们称这种情况为“弱监督”。基本的直觉是,声学令牌模型和音素模型的HMM状态之间的高度相似性可以帮助他们在这个多任务学习框架中相互学习。对Facebook帖子记录的个性化音频数据集进行的初步实验表明,很好与常用的基准(例如fDLR,说话者代码和轻微监督的适应)相比,可以在帧精度和字精度方面实现改进。这种方法是对现有说话人适应方法的补充,可以与此类技术结合使用以产生更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号