首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders
【24h】

Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders

机译:源端重新排序以提高语言与不同词汇的机器翻译

获取原文
获取原文并翻译 | 示例

摘要

English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation. is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.
机译:英语和印地文的单词订单具有明显不同的单词。英语遵循主题动词对象(SVO)订单,而印地语主要遵循主题对象 - 动词(SOV)顺序。这种差异构成了为翻译建模这双语言的挑战。在基于短语的翻译系统中,Word Reatrodeation由语言模型,短语表和重新排序模型管理。通过在定义的窗口内的单词进行解码期间,通常在解码期间重新排序。这些系统可以处理本地重新排序,而在短语形成期间执行一些短语级重新排序,则它们在学习长距离重排蛋白方面是薄弱的。为了克服这种弱点,研究人员已经使用重新排序作为预处理的步骤,以便在字令方面将重新排序的源句渲染到目标语言。这种方法专注于使用语音部分(POS)标记序列并通过使用语法规则来重新排序语法树,或通过头最终确定。这项研究表明,只有最终决定不足以在英语 - 印地文队中重新排序句子。它描述了各种语法结构,并提出了与原始和最终最终表示的重新排序的比较评估。重新排序对翻译质量的影响。通过基于短语的统计系统和神经机翻译系统来衡量通过BLEU分数来衡量。在不同语法结构中重新排序的评分中的重新排序的重新排序的显着增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号