首页> 外文会议>Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD 2005); 20050518-20; Hanoi(VN) >Automatic Extraction of Low Frequency Bilingual Word Pairs from Parallel Corpora with Various Languages
【24h】

Automatic Extraction of Low Frequency Bilingual Word Pairs from Parallel Corpora with Various Languages

机译:从多种语言的并行语料库中自动提取低频双语单词对

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a new learning method for extraction of low-frequency bilingual word pairs from parallel corpora with various languages. It is important to extract low-frequency bilingual word pairs because the frequencies of many bilingual word pairs are very low when large-scale parallel corpora are unobtainable. We use the following inference to extract low frequency bilingual word pairs: the word equivalents that adjoin the source language words of bilingual word pairs also adjoin the target language words of bilingual word pairs in local parts of bilingual sentence pairs. Evaluation experiments indicated that the extraction rate of our system was more than 8.0 percentage points higher than the extraction rate of the system based on the Dice coefficient. Moreover, the extraction rates of bilingual word pairs for which the frequencies are one and two respectively improved 11.0 and 6.6 percentage points using AIL.
机译:本文提出了一种从多种语言的并行语料库中提取低频双语单词对的新学习方法。提取低频双语单词对非常重要,因为在无法获得大规模并行语料库的情况下,许多双语单词对的频率非常低。我们使用以下推论来提取低频双语单词对:在双语句子对的局部中,与双语单词对的源语言单词相邻的单词对等词也与双语单词对的目标语言单词相邻。评估实验表明,我们的系统的提取率比基于Dice系数的系统的提取率高8.0个百分点以上。此外,使用AIL,频率分别为1和2的双语单词对的提取率分别提高了11.0和6.6个百分点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号