Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

RUI WANG; MASAO UTIYAMA; ISAO GOTO; EIICHIRO SUMITA; HAI ZHAO; BAO-LIANG LU

首页> 外文期刊>ACM transactions on Asian language information processing >Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

【24h】

Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

机译：通过高效的双语修剪将连续空间语言模型转换为N-gram语言模型以进行统计机器翻译

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Language Model (LM) is an essential component of Statistical Machine Translation (SMT). In this article, we focus on developing efficient methods for LM construction. Our main contribution is that we propose a Natural iV-grams based Converting (NNGC) method for transforming a Continuous-Space Language Model (CSLM) to a Back-off iV-gram Language Model (BNLM). Furthermore, a Bilingual LM Pruning (BLMP) approach is developed for enhancing LMs in SMT decoding and speeding up CSLM converting. The proposed pruning and converting methods can convert a large LM efficiently by working jointly. That is, a LM can be effectively pruned before it is converted from CSLM without sacrificing performance, and further improved if an additional corpus contains out-of-domain information. For different SMT tasks, our experimental results indicate that the proposed NNGC and BLMP methods outperform the existing counterpart approaches significantly in BLEU and computational cost.

机译：语言模型（LM）是统计机器翻译（SMT）的重要组成部分。在本文中，我们重点研究开发用于LM构造的有效方法。我们的主要贡献是，我们提出了一种基于自然iV-gram的转换（NNGC）方法，用于将连续空间语言模型（CSLM）转换为退避iV-gram语言模型（BNLM）。此外，开发了双语LM修剪（BLMP）方法，以增强SMT解码中的LM并加速CSLM转换。提出的修剪和转换方法可以通过共同工作有效地转换大型LM。也就是说，在从CSLM转换LM之前可以对其进行有效修剪，而不会牺牲性能，如果其他语料库包含域外信息，则可以进一步改进LM。对于不同的SMT任务，我们的实验结果表明，所提出的NNGC和BLMP方法在BLEU和计算成本上明显优于现有的对应方法。

著录项

来源
《ACM transactions on Asian language information processing》 |2016年第3期|11.1-11.26|共26页
作者
RUI WANG; MASAO UTIYAMA; ISAO GOTO; EIICHIRO SUMITA; HAI ZHAO; BAO-LIANG LU;
展开▼
作者单位

Center for Brain-like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China, 200240;

Multilingual Translation Laboratory, National Institute of Information and Communications Technology, 3-5 Hikaridai, Kei-hanna Science City, Kyoto 619-0289, Japan;

NHK and National Institute of Information and Communications Technology,NHK Science & Technology Research Laboratories, 1-10-11 Kinuta, Setagaya-ku, Tokyo 157-8510, Japan;

Multilingual Translation Laboratory, National Institute of Information and Communications Technology, 3-5 Hikaridai, Keihanna Science City, Kyoto 619-0289, Japan;

Center for Brain-like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China, 200240;

Center for Brain-like Computing and Machine Intelligence, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai, China, 200240;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine translation; continuous-space language model; neural network language model; language model pruning;

机译：机器翻译;连续空间语言模型;神经网络语言模型;语言模型修剪;

相似文献

外文文献
中文文献
专利

1. Factored bilingual n-gram language models for statistical machine translation [J] . Josep M. Crego, Francois Yvon Machine translation . 2010,第2期

机译：统计机器翻译的因子双语n-gram语言模型
2. An empirical study of statistical language models: n-gram language models vs. neural network language models [J] . Freha Mezzoudj, Abdelkader Benyettou International Journal of Innovative Computing and Applications . 2018,第4期

机译：统计语言模型的实证研究：n-gram语言模型与神经网络语言模型
3. Splitting Input for Machine Translation Using N-gram Language Model Together with Utterance Similarity [J] . Takao DOI, Eiichiro SUMITA IEICE Transactions on Information and Systems . 2005,第6期

机译：使用N-gram语言模型和话语相似性为机器翻译拆分输入
4. Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers [C] . Deyi Xiong, Min Zhang, Haizhou Li Annual meeting of the Association for Computational Linguistics;ACL 2011 . 2012

机译：使用反向N-gram和互信息触发器增强统计机器翻译中的语言模型
5. Language-independent text learning with statistical n-gram language models. [D] . Peng, Fuchun. 2003

机译：统计n-gram语言模型的独立于语言的文本学习。
6. Modeling Actions of PubMed Users with N-Gram Language Models [O] . Jimmy Lin, W. John Wilbur -1

机译：N-Gram语言模型对PubMed用户的建模动作
7. Efficient Handling of N-gram Language Models for Statistical Machine Translation [O] . Marcello Federico, Fondazione Bruno, Kessler Irst, 2009

机译：统计机器翻译的N-gram语言模型的有效处理

Converting Continuous-Space Language Models into N-gram Language Models with Efficient Bilingual Pruning for Statistical Machine Translation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅