首页> 外文OA文献 >Approximate sentence retrieval for scalable and efficient example-based machine translation

【2h】

Approximate sentence retrieval for scalable and efficient example-based machine translation

机译：用于可扩展且高效的基于示例的机器翻译的近似句子检索

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Approximate sentence matching (ASM) is an important technique for tasks in machine translation (MT) such as example-based MT (EBMT) which influences the translation time and the quality of translation output. We investigate different approaches to find similar sentences in an example base and evaluate their efficiency (runtime), effectiveness, and the resulting quality of translation output. A comparison of approaches demonstrates that i) a sequential computation of the edit distance between an input sentence and all sentences in the example base is not feasible, even when efficient algorithms to compute the edit distance are employed; ii) in-memory data structuresudsuch as tries and ternary search trees are more efficient in terms of runtime, but are not scalable for large example bases; iii) standard IR models which only cover material similarity (e.g. term overlap), do not perform well in finding the approximate matches, due to their lack of handling word order and word positions. We propose a new retrieval model derived from language modelling (LM), named LM-ASM, to include positional and ordinal similarities in the matching process, in addition to material similarity. Our IR based retrieval experiments involve reranking the top-ranked documents based on their true edit distance score. Experimental results show that i) IR based approaches result in about 100 times faster translation; ii) LM-ASM approximates edit distance better than standard LM by about 10%; and iii) surprisingly, LM-ASM even improves MT quality by 1:52% in comparison to sequential edit distance computation.

机译：近似句子匹配（ASM）是机器翻译（MT）中的一项重要技术，例如基于示例的MT（EBMT），它会影响翻译时间和翻译输出质量。我们研究了在示例库中查找相似句子的不同方法，并评估了它们的效率（运行时），有效性以及翻译输出的质量。方法的比较表明：i）即使在使用有效的算法来计算编辑距离的情况下，顺序计算输入语句和示例库中所有语句之间的编辑距离也是不可行的； ii）内存中的数据结构（例如尝试和三元搜索树）在运行时方面效率更高，但对于大型示例库却无法扩展； iii）仅覆盖材料相似性（例如，术语重叠）的标准IR模型，由于缺乏处理单词顺序和单词位置的能力，因此无法很好地找到近似匹配项。我们提出了一种新的检索模型，该模型取材自语言建模（LM），名为LM-ASM，除了材料相似性外，还包括匹配过程中的位置和顺序相似性。我们基于IR的检索实验涉及根据真实的编辑距离得分对排名最高的文档进行排名。实验结果表明：i）基于IR的方法可将翻译速度提高约100倍； ii）LM-ASM的编辑距离比标准LM约好10％； iii）令人惊讶的是，与顺序编辑距离计算相比，LM-ASM甚至将MT质量提高了1：52％。

著录项

作者
Ganguly Debasis; Leveling Johannes; Dandapat Sandipan; Jones Gareth J.F.;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Example-Based Machine Translation Using Efficient Sentence Retrieval Based on Edit-Distance [J] . TAKAO DOI, HIROFUMI YAMAMOTO, EIICHIRO SUMITA ACM transactions on Asian language information processing . 2005,第4期

机译：基于编辑距离的基于有效语句的基于实例的机器翻译
2. THE IMPLEMENTATION OF THE EXAMPLE-BASED MACHINE TRANSLATION TECHNIQUE FOR GREEK-TO-POLISH MACHINE TRANSLATION SYSTEM [J] . Miroslaw GAJER Foundations of computing and decision sciences . 2003,第2期

机译：基于示例的机器翻译技术在希腊到波兰机器翻译系统中的实现
3. Analogical-Based Translation Hypothesis Derivation with Structural Semantics for English to Malay Example-Based Machine Translation [J] . Advanced Science Letters . 2018,第2期

机译：基于模拟的转换假设推导与基于马来语示例的英语结构语义
4. Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation [C] . Johannes Leveling, Debasis Ganguly, Sandipan Dandapat, International conference on computational linguistics . 2012

机译：基于示例的可伸缩高效机器翻译的近似句子检索
5. Coping with Data-sparsity in Example-based Machine Translation. [D] . Gangadharaiah, Rashmi. 2011

机译：在基于示例的机器翻译中应对数据稀疏性。
6. Using approximate Bayesian computation for estimating parameters in the cue-based retrieval model of sentence processing [O] . Shravan Vasishth 2020

机译：基于近似贝叶斯计算的句子处理基于线索的检索模型中的参数估计
7. Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation [O] . Rui Wang, Masao Utiyama, Eiichiro Sumita 2018

机译：高效训练神经机翻译动态句子抽样
8. Inverted File Tree Machine: Efficient Multi-Key Retrieval for VLSI (Very Large Scale Integration) [R] . Kriegel, H. P. , Mannss, R. , Overmars, M. 1985

机译：反向文件树机：VLsI（超大规模集成）的高效多密钥检索

Approximate sentence retrieval for scalable and efficient example-based machine translation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅