首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines
【24h】

Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines

机译:向您的n-gram显示一些爱:一些进步和更强大的n-gram语言建模基准

获取原文

摘要

In recent years neural language models (LMs) have set state-of-the-art performance for several benchmarking datasets. While the reasons for their success and their computational demand are well-documented, a comparison between neural models and more recent developments in n-gram models is neglected. In this paper, we examine the recent progress in n-gram literature, running experiments on 50 languages covering all morphological language families. Experimental results illustrate that a simple extension of Modified Kneser-Ncy outperforms an LSTM language model on 42 languages while a word-level Bayesian n-gram LM (Shareghi el al., 2017) outperforms the character-awarc neural model (Kim ct al., 2016) on average across all languages, and its extension which explicitly injects linguistic knowledge (Gerz et al., 2018a) on 8 languages. Further experiments on larger Eu-roparl datasets for 3 languages indicate that neural architectures arc able to outperform computationally much cheaper n-gram models: n-gram training is up to 15, 000×quicker. Our experiments illustrate that standalone n-gram models lend themselves as natural choices for resource-lean or morphologically rich languages, while the recent progress has significantly improved their accuracy.
机译:近年来,神经语言模型(LM)为多个基准数据集设置了最先进的性能。尽管它们成功的原因及其计算需求已得到充分证明,但神经模型与n-gram模型的最新发展之间的比较却被忽略了。在本文中,我们检查了n-gram文学的最新进展,对涵盖所有形态语言族的50种语言进行了实验。实验结果表明,改进的Kneser-Ncy的简单扩展优于42种语言的LSTM语言模型,而单词级贝叶斯n-gram LM(Shareghi et al。,2017)优于字符预警神经模型(Kim ct。 (2016年)。平均而言,其扩展名可以为8种语言明确注入语言知识(Gerz等,2018a)。在3种语言的更大Eu-roparl数据集上的进一步实验表明,神经体系结构能够在计算上优于便宜得多的n-gram模型:n-gram训练最多可达到15,000×剔除。我们的实验表明,独立的n-gram模型很适合作为资源贫乏或形态丰富的语言的自然选择,而最近的进展已大大提高了它们的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号