【24h】

Impact of Word Classing on Shrinkage-Based Language Models

机译:单词分类对基于收缩的语言模型的影响

获取原文

摘要

This paper investigates the impact of word classing on a recently proposed shrinkage-based language model, Model M [5]. Model M, a class-based n-gram model, has been shown to significantly outperform word-based n-gram models on a variety of domains. In past work, word classes for Model M were induced automatically from unlabeled text using the algorithm of [2]. We take a closer look at the classing and attempt to find out whether improved classing would also translate to improved performance. In particular, we explore the use of manually-assigned classes, part-of-speech (POS) tags, and dialog state information, considering both hard classing and soft classing. In experiments with a conversational dialog system (human-machine dialog) and a speech-to-speech translation system (human-human dialog), we find that better classing can improve Model M performance by up to 3% absolute in word-error rate.
机译:本文研究了单词分类对最近提出的基于收缩的语言模型Model M [5]的影响。模型M是基于类的n-gram模型,已显示出在许多领域上明显优于基于单词的n-gram模型。在过去的工作中,使用[2]的算法从未标记的文本中自动归纳出模型M的单词类别。我们仔细研究分类,并尝试找出改进的分类是否还会转化为改进的性能。特别是,我们在考虑硬分类和软分类的情况下,探索了如何使用手动分配的类,词性(POS)标签和对话状态信息。在对话式对话系统(人机对话)和语音对语音翻译系统(人-人对话)的实验中,我们发现更好的分类可以将Model M的性能提高高达3%的绝对绝对错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号