首页> 外文会议>Annual conference of the International Speech Communication Association >Supervised and unsupervised Web-based language model domain adaptation
【24h】

Supervised and unsupervised Web-based language model domain adaptation

机译:有监督和无监督的基于Web的语言模型域自适应

获取原文

摘要

Domain language model adaptation consists in re-estimating probabilities of a baseline LM in order to better match the specifics of a given broad topic of interest. To do so, a com mon strategy is to retrieve adaptation texts from the Web based on a given domain-representative seed text. In this paper, we study how the selection of this seed text influences the adapta tion process and the performances of resulting adapted language models in automatic speech recognition. More precisely, the goal of this original study is to analyze the differences of our Web-based adaptation approach between the supervised case, in which the seed text is manually generated, and the unsuper vised case, where the seed text is given by an automatic tran script. Experiments were carried out on data sourced from a real-world use case, more specifically, videos produced for a university YouTube channel. Results show that our approach is quite robust since the unsupervised adaptation provides sim ilar performance to the supervised case in terms of the overall perplexity and word error rate.
机译:领域语言模型的适应性在于重新估计基线LM的概率,以便更好地匹配给定的广泛关注主题的细节。为此,一种常见的策略是基于给定的域代表种子文本从Web检索适应文本。在本文中,我们研究了该种子文本的选择如何影响自适应过程以及由此产生的自适应语言模型在自动语音识别中的性能。更确切地说,这项原始研究的目的是分析我们基于Web的适应方法在监督情况下的区别,在监督情况下是手动生成种子文本,在非监督情况下是通过自动生成种子文本。 tran脚本。实验是根据来自真实用例的数据进行的,更具体地说,是为大学YouTube频道制作的视频。结果表明,我们的方法非常健壮,因为在整体困惑度和字错误率方面,无监督自适应提供了与有监督案例相似的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号