首页> 外文期刊>ACM transactions on Asian language information processing >Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences
【24h】

Inducing a Bilingual Lexicon from Short Parallel Multiword Sequences

机译:从短并行多词序列中导出双语词典

获取原文
获取原文并翻译 | 示例
       

摘要

This article proposes a technique for mining bilingual lexicons from pairs of parallel short word sequences. The technique builds a generative model from a corpus of training data consisting of such pairs. The model is a hierarchical nonparametric Bayesian model that directly induces a bilingual lexicon while training. The model learns in an unsupervised manner and is designed to exploit characteristics of the language pairs being mined. The proposed model is capable of utilizing commonly used word-pair frequency information and additionally can employ the internal character alignments within the words themselves. It is thereby capable of mining transliterations and can use reliably aligned transliteration pairs to support the mining of other words in their context. The model is also capable of performing word reordering and word deletion during the alignment process, and it is furthermore capable of operating in the absence of full segmentation information. In this work, we study two mining tasks based on English- Japanese and English-Chinese language pairs, and compare the proposed approach to baselines based on a simpler models that use only word-pair frequency information. Our results show that the proposed method is able to mine bilingual word pairs at higher levels of precision and recall than the baselines.
机译:本文提出了一种从并行的短单词序列对中提取双语词典的技术。该技术从由此类数据对组成的训练数据语料库构建生成模型。该模型是分层的非参数贝叶斯模型,可在训练时直接生成双语词典。该模型以无监督的方式学习,旨在利用要挖掘的语言对的特征。提出的模型能够利用常用的词对频率信息,并且还可以在词本身内采用内部字符对齐方式。因此,它能够挖掘音译,并且可以使用可靠对齐的音译对来支持上下文中其他单词的挖掘。该模型还能够在对齐过程中执行单词重新排序和单词删除,并且还能够在没有完整的分段信息的情况下进行操作。在这项工作中,我们研究了两种基于英语-日语和英语-中文语言对的挖掘任务,并比较了基于仅使用单词对频率信息的简单模型的基线建议方法。我们的结果表明,与基线相比,该方法能够以更高的精度和召回率挖掘双语单词对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号