首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >A hierarchical language model based on variable-length classsequences: the MCnν approach
【24h】

A hierarchical language model based on variable-length classsequences: the MCnν approach

机译:基于可变长度类序列的分层语言模型:MCnν方法

获取原文
获取原文并翻译 | 示例

摘要

We propose a new language model which represents long-term dependencies between word sequences using a multilevel hierarchy. We call this model MCnν, where n is the maximum number of words in a sequence and ν is the maximum number of levels. The originality of this model, which is an extension of the multigrams, is its ability to take into account long distance dependencies according to dependent variable-length sequences. In order to discover the variable-length sequences and to build the hierarchy, we use a set of 233 syntactic classes extracted from eight elementary grammatical classes of French. The MCnν model learns hierarchical word patterns and uses them to reevaluate and filter the n-best utterance hypotheses output by our speech recognizer MAUD. The model has been trained on a corpus of 43 million words extracted from the French newspaper "Le Monde" and uses a vocabulary of 20 000 words. Tests have been conducted on 300 sentences. Compared to the class trigram and the baseline multigrams approach, we report a perplexity reduction of 17% and 20%, respectively. Rescoring the original n-best hypotheses resulted in an improvement of the word error rate: 7% and 2% compared to the class trigram and multigrams, respectively
机译:我们提出了一种新的语言模型,该模型使用多层次结构表示单词序列之间的长期依赖关系。我们称此模型为MCnν,其中n是序列中最大单词数,ν是最大级别数。该模型的独创性(是多图的扩展)是其根据相关可变长度序列考虑长距离依赖性的能力。为了发现可变长度序列并建立层次结构,我们使用了从法语的八个基本语法类中提取的233个语法类。 MCnν模型学习分层的单词模式,并使用它们来重新评估和过滤由我们的语音识别器MAUD输出的n个最佳话语假设。该模型已经从法国《世界报》上提取了4,300万个单词的语料库进行了训练,并使用了2万个单词的词汇。测试已对300个句子进行。与三元组和基线多字组方法相比,我们报告的困惑度分别降低了17%和20%。对原始的n个最佳假设进行核对可以使单词错误率得到改善:与三类组合词和多类组合词相比,分别为7%和2%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号