首页> 外文会议>4th Workshop on building and using comparable corpora: comparable corpora and the web 2011 >Learning the Optimal use of Dependency-parsing Information for Finding Translations with Comparable Corpora
【24h】

Learning the Optimal use of Dependency-parsing Information for Finding Translations with Comparable Corpora

机译:学习可比性语料库的最佳用法,以使用可比语料库查找翻译

获取原文
获取原文并翻译 | 示例

摘要

Using comparable corpora to find new word translations is a promising approach for extending bilingual dictionaries (semi-) automatically. The basic idea is based on the assumption that similar words have similar contexts across languages. The context of a word is often summarized by using the bag-of-words in the sentence, or by using the words which are in a certain dependency position, e.g. the predecessors and successors. These different context positions are then combined into one context vector and compared across languages. However, previous research makes the (implicit) assumption that these different context positions should be weighted as equally important. Furthermore, only the same context positions are compared with each other, for example the successor position in Spanish is compared with the successor position in English. However, this is not necessarily always appropriate for languages like Japanese and English. To overcome these limitations, we suggest to perform a linear transformation of the context vectors, which is defined by a matrix. We define the optimal transformation matrix by using a Bayesian probabilistic model, and show that it is feasible to find an approximate solution using Markov chain Monte Carlo methods. Our experiments demonstrate that our proposed method constantly improves translation accuracy.
机译:使用可比语料库查找新词翻译是一种自动扩展双语词典(半)的有前途的方法。基本思想是基于这样的假设:相似的词在各种语言中具有相似的上下文。单词的上下文通常通过使用句子中的单词袋或通过使用处于特定从属位置(例如,从句)的单词来概括。前辈和后继者。然后将这些不同的上下文位置组合为一个上下文向量,并跨语言进行比较。但是,先前的研究做出(隐式)假设,即应将这些不同的上下文位置加权为同等重要。此外,仅将相同的上下文位置彼此比较,例如,将西班牙语中的后继位置与英语中的后继位置进行比较。但是,这不一定总是适用于日语和英语等语言。为了克服这些限制,我们建议对上下文向量进行线性变换,该向量由矩阵定义。我们使用贝叶斯概率模型定义了最优变换矩阵,并表明使用马尔可夫链蒙特卡罗方法找到近似解是可行的。我们的实验表明,我们提出的方法不断提高翻译的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号