首页> 外文会议>Conference on Intelligent Text Processing and Computational Linguistics >Class-Based Language Models for Chinese-English Parallel Corpus*
【24h】

Class-Based Language Models for Chinese-English Parallel Corpus*

机译:基于课堂语言模型的中英语并行语料库*

获取原文

摘要

This paper addresses using novel class-based language models on parallel corpora, focusing specifically on English and Chinese languages. We find that the perplexity of Chinese is generally much higher than English and discuss the possible reasons. We demonstrate the relative effectiveness of using class-based models over the modified Kneser-Ney trigram model for our task. We also introduce a rare events clustering and a polynomial discounting mechanism, which is shown to improve results. Our experimental results on parallel corpora indicate that the improvement due to classes are similar for English and Chinese. This suggests that class-based language models should be used for both languages.
机译:本文在平行语料库上使用基于小型类的语言模型,专注于英语和中文。我们发现,中国人的困惑通常远远高于英语,并讨论了可能的原因。我们展示了使用基于类模型在修改的Kneser-Ney三元模型中使用基于类模型的相对有效性。我们还介绍了一种罕见的活动聚类和多项式折扣机制,显示出改善结果。我们对平行语料库的实验结果表明由于课程的改善是类似的英文和中文。这表明基于类的语言模型应该用于两种语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号