【24h】

Panlingual Lexical Translation via Probabilistic Inference

机译:通过概率推断进行的双语词汇翻译

获取原文

摘要

The bare minimum lexical resource required to translate between a pair of languages is a translation dictionary. Unfortunately, dictionaries exist only between a tiny fraction of the 49 million possible language-pairs making machine translation virtually impossible between most of the languages. This paper summarizes the last four years of our research motivated by the vision of panlingual communication. Our research comprises three key steps. First, we compile over 630 freely available dictionaries over the Web and convert this data into a single representation - the translation graph. Second, we build several inference algorithms that infer translations between word pairs even when no dictionary lists them as translations. Finally, we run our inference procedure offline to construct PanDictionary- a sense-distinguished, massively multilingual dictionary that has translations in more than 1000 languages. Our experiments assess the quality of this dictionary and find that we have 4 times as many translations at a high precision of 0.9 compared to the English Wiktionary, which is the lexical resource closest to PanDictionary.
机译:在一对语言之间进行翻译所需的最低限度的词汇资源是翻译词典。不幸的是,词典仅存在于4900万种可能的语言对中的一小部分,使得大多数语言之间的机器翻译几乎是不可能的。本文总结了我们的研究的最后四年,该研究是由双语交流的愿景所推动的。我们的研究包括三个关键步骤。首先,我们在网络上编译了630多种免费可用的词典,并将这些数据转换为单个表示形式-翻译图。其次,我们建立了几种推理算法,即使没有字典将单词对翻译为翻译,也可以推断单词对之间的翻译。最终,我们离线运行推理程序以构建PanDictionary,这是一种具有感知意义的,大规模多语言词典,可以翻译1000多种语言。我们的实验评估了该词典的质量,发现与英语Wiktionary(最接近PanDictionary的词汇资源)相比,我们以0.9的高精度进行了4倍的翻译。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号