首页> 外文会议>International Conference on Advanced Informatics: Concepts, Theory and Applications >Effect of linguistic information in neural machine translation
【24h】

Effect of linguistic information in neural machine translation

机译:语言信息在神经机器翻译中的作用

获取原文

摘要

Deep Neural Networks(DNNs) outperform previous works in many fields such as in natural language processing. Neural Machine Translation(NMT) also outperforms Statistical Machine Translation(SMT) which has complex features and rules. However, NMT requires a large corpus and a long calculation time. In order to suppress calculation cost, recent researches replaced low frequency words with symbols. However, the symbols make sentences ambiguous and deteriorates translation accuracy. To solve this problem, sub-word units such as Byte Pair Encoding(BPE) and Wordpiece Model(WPM) creating vocabularies in a prespecified vocabulary size has been proposed. Nevertheless, these tokenize methods break words and treat them as symbols. Words as symbols are compatible with neural networks and NMT performance has increased. This result shows that linguistic correctness is not necessarily important in NMT. If that is the case, we wonder to what extent linguistic correctness contributes to NMT accuracy. In this research, we experiment to incorporate linguistic information into sub-word units. Experimentally, we demonstrate that morpheme as linguistic information is a helpful factor for sub-word units.
机译:深度神经网络(DNN)以前的许多领域优于诸如自然语言处理的许多领域。神经机翻译(NMT)也优于具有复杂特性和规则的统计机器翻译(SMT)。然而,NMT需要大语料库和长的计算时间。为了抑制计算成本,最近的研究用符号取代了低频词。但是,符号使句子含糊不清并恶化的翻译准确性。为了解决这个问题,已经提出了在预先限定的词汇大小中创建词汇表的字节对编码(BPE)和Weppiece模型(WPM)的子字单元。尽管如此,这些标记方法会破坏单词并将其视为符号。作为符号的单词与神经网络兼容,NMT性能增加。该结果表明,语言正确性在NMT中并不一定重要。如果是这种情况,我们会想到语言正确性在多大程度上有助于NMT的准确性。在本研究中,我们实验将语言信息纳入子字单元。实验,我们证明了语言作为语言信息是子字单元的有用因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号