首页> 外文期刊>Natural language engineering >Linguistic knowledge in statistical phrase-based word alignment
【24h】

Linguistic knowledge in statistical phrase-based word alignment

机译:基于统计短语的单词对齐中的语言知识

获取原文
获取原文并翻译 | 示例

摘要

In this paper, a novel phrase alignment strategy combining linguistic knowledge and cooccurrence measures extracted from bilingual corpora is presented. The algorithm is mainly divided into four steps, namely phrase selection and classification, phrase alignment, one-to-one word alignment and postprocessing. The first stage selects a linguistically-derived set of phrases that convey a unified meaning during translation and are therefore aligned together in parallel texts. These phrases include verb phrases, idiomatic expressions and date expressions. During the second stage, very high precision links between these selected phrases for both languages are produced. The third step performs a statistical word alignment using association measures and link probabilities with the remaining unaligned tokens, and finally the fourth stage takes final decisions on unaligned tokens based on linguistic knowledge. Experiments are reported for an English-Spanish parallel corpus, with a detailed description of the evaluation measure and manual reference used. Results show that phrase cooccurrence measures convey a complementary information to word cooccurrences and a stronger evidence of a correct alignment, successfully introducing linguistic knowledge in a statistical word alignment scheme. Precision, Recall and Alignment Error Rate (AER) results are presented, outperforming state-of-the-art alignment algorithms.
机译:本文提出了一种新的短语对齐策略,该策略结合了从双语语料库中提取的语言知识和共现度量。该算法主要分为四个步骤,即短语选择和分类,短语对齐,一对一单词对齐和后处理。第一阶段选择一组语言派生的短语,这些短语在翻译过程中传达统一的含义,因此在平行文本中对齐在一起。这些短语包括动词短语,惯用语和日期表达。在第二阶段中,在两种语言的这些选定短语之间产生了非常高精度的链接。第三步使用关联度量执行统计字对齐,并将概率与其余未对齐标记相链接,最后第四步基于语言知识对未对齐标记做出最终决策。报告了英语-西班牙语平行语料库的实验,并详细介绍了评估方法和所使用的手册。结果表明,短语共现措施向单词共现传达了补充信息,并提供了正确对齐的更强有力的证据,从而成功地将语言知识引入了统计单词对齐方案中。给出了精度,召回率和对准误差率(AER)结果,优于最新的对准算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号