首页> 外文期刊>BioTechnology: An Indian Journal >Research on machine translation based on key technologies of bilingual corpus
【24h】

Research on machine translation based on key technologies of bilingual corpus

机译:基于双语语料库关键技术的机器翻译研究

获取原文
       

摘要

With the development of the technology of statistical natural language processing, the role of parallel corpus in statistical machine translation and cross-language retrieval cannot be ignored. In this paper, we examines the translation equivalent pairs could be extracted from parallel corpus. An iterative algorithm based on degree of word association is proposed to identify the multiword units for Chinese and English. Then a hypothesis testing approach is used to extract the Chinese English Translation Equivalent Pairs. We present a tree-tree model by mapping between the syntactic tree and the ITG tree, the model limits the reordering of the phrases in the global scope. While in the local scope, the tree-tree model takes the TTG-based local reordering model as one feature, in which the reordering probability of two blocks is decomposed into the product of the reordering probabilities of the child blocks respectively. So the model is able to estimate the reordering of two blocks with arbitrary lengths.
机译:随着统计自然语言处理技术的发展,并行语料库在统计机器翻译和跨语言检索中的作用不容忽视。在本文中,我们研究了可以从平行语料库中提取翻译对等对。提出了一种基于词关联度的迭代算法来识别中文和英文的多词单元。然后采用假设检验的方法提取汉英翻译对等词对。我们通过在句法树和ITG树之间进行映射来提出树型模型,该模型限制了全局范围内短语的重新排序。在局部范围内,树树模型将基于TTG的局部重排序模型作为一个特征,其中将两个块的重排序概率分别分解为子块的重排序概率的乘积。因此,该模型能够估计具有任意长度的两个块的重新排序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号