首页> 外文OA文献 >Approximate sentence retrieval for scalable and efficient example-based machine translation
【2h】

Approximate sentence retrieval for scalable and efficient example-based machine translation

机译:用于可扩展且高效的基于示例的机器翻译的近似句子检索

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) such as example-based MT (EBMT) which influences the translation time and the quality of translation output. We investigate different approaches to find similar sentences in an example base and evaluate their efficiency (runtime), effectiveness, and the resulting quality of translation output. A comparison of approaches demonstrates that i) a sequential computation of the edit distance between an input sentence and all sentences in the example base is not feasible, even when efficient algorithms to compute the edit distance are employed; ii) in-memory data structuresudsuch as tries and ternary search trees are more efficient in terms of runtime, but are not scalable for large example bases; iii) standard IR models which only cover material similarity (e.g. term overlap), do not perform well in finding the approximate matches, due to their lack of handling word order and word positions. We propose a new retrieval model derived from language modelling (LM), named LM-ASM, to include positional and ordinal similarities in the matching process, in addition to material similarity. Our IR based retrieval experiments involve reranking the top-ranked documents based on their true edit distance score. Experimental results show that i) IR based approaches result in about 100 times faster translation; ii) LM-ASM approximates edit distance better than standard LM by about 10%; and iii) surprisingly, LM-ASM even improves MT quality by 1:52% in comparison to sequential edit distance computation.
机译:近似句子匹配(ASM)是机器翻译(MT)中的一项重要技术,例如基于示例的MT(EBMT),它会影响翻译时间和翻译输出质量。我们研究了在示例库中查找相似句子的不同方法,并评估了它们的效率(运行时),有效性以及翻译输出的质量。方法的比较表明:i)即使在使用有效的算法来计算编辑距离的情况下,顺序计算输入语句和示例库中所有语句之间的编辑距离也是不可行的; ii)内存中的数据结构(例如尝试和三元搜索树)在运行时方面效率更高,但对于大型示例库却无法扩展; iii)仅覆盖材料相似性(例如,术语重叠)的标准IR模型,由于缺乏处理单词顺序和单词位置的能力,因此无法很好地找到近似匹配项。我们提出了一种新的检索模型,该模型取材自语言建模(LM),名为LM-ASM,除了材料相似性外,还包括匹配过程中的位置和顺序相似性。我们基于IR的检索实验涉及根据真实的编辑距离得分对排名最高的文档进行排名。实验结果表明:i)基于IR的方法可将翻译速度提高约100倍; ii)LM-ASM的编辑距离比标准LM约好10%; iii)令人惊讶的是,与顺序编辑距离计算相比,LM-ASM甚至将MT质量提高了1:52%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号