首页> 外文期刊>IEICE transactions on information and systems >Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition
【24h】

Domain Adaptation Based on Mixture of Latent Words Language Models for Automatic Speech Recognition

机译:基于潜在词语言模型混合的领域自适应语音自动识别

获取原文
           

摘要

This paper proposes a novel domain adaptation method that can utilize out-of-domain text resources and partially domain matched text resources in language modeling. A major problem in domain adaptation is that it is hard to obtain adequate adaptation effects from out-of-domain text resources. To tackle the problem, our idea is to carry out model merger in a latent variable space created from latent words language models (LWLMs). The latent variables in the LWLMs are represented as specific words selected from the observed word space, so LWLMs can share a common latent variable space. It enables us to perform flexible mixture modeling with consideration of the latent variable space. This paper presents two types of mixture modeling, i.e., LWLM mixture models and LWLM cross-mixture models. The LWLM mixture models can perform a latent word space mixture modeling to mitigate domain mismatch problem. Furthermore, in the LWLM cross-mixture models, LMs which individually constructed from partially matched text resources are split into two element models, each of which can be subjected to mixture modeling. For the approaches, this paper also describes methods to optimize mixture weights using a validation data set. Experiments show that the mixture in latent word space can achieve performance improvements for both target domain and out-of-domain compared with that in observed word space.
机译:本文提出了一种新的领域自适应方法,该方法可以在语言建模中利用域外文本资源和部分域匹配文本资源。域适应中的一个主要问题是,很难从域外文本资源中获得足够的适应效果。为了解决这个问题,我们的想法是在由潜在单词语言模型(LWLM)创建的潜在变量空间中进行模型合并。 LWLM中的潜在变量表示为从观察到的单词空间中选择的特定单词,因此LWLM可以共享公共的潜在变量空间。它使我们能够在考虑潜在变量空间的情况下执行灵活的混合建模。本文介绍了两种类型的混合物建模,即LWLM混合物模型和LWLM交叉混合物模型。 LWLM混合模型可以执行潜在的词空间混合建模,以缓解域不匹配问题。此外,在LWLM交叉混合模型中,由部分匹配的文本资源单独构建的LM分为两个元素模型,每个模型都可以进行混合建模。对于这些方法,本文还介绍了使用验证数据集优化混合物权重的方法。实验表明,与观察到的词空间相比,潜在词空间​​中的混合可以在目标域和域外实现性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号