【24h】

A Robust Cross-Style Bilingual Sentences Alignment Model

机译:鲁棒的跨风格双语句子对齐模型

获取原文
获取原文并翻译 | 示例

摘要

Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the documents with the same style. Since the length distribution, alignment-type distribution (used by length-based approaches) and cognate frequency vary significantly across texts with different styles, the length-based approaches fail to achieve similar performance when tested in corpora of different styles. The experiments show that the performance in F-measure could drop from 98.2% to 85.6% when a length-based approach is trained by a technical manual and then tested on a general magazine. Since a large percentage of content words in the source text would be translated into the corresponding translation duals to preserve the meaning in the target text, transfer lexicons are usually regarded as more reliable cues for aligning sentences when the alignment task is performed by human. To enhance the robustness, a robust statistical model based on both transfer lexicons and sentence lengths are proposed in this paper. After integrating the transfer lexicons into the model, a 60% F-measure error reduction (from 14.4% to 5.8%) is observed.
机译:当前大多数句子对齐方式都采用句子长度和同源作为对齐特征。并且他们大多数都经过相同样式的文档培训和测试。由于长度分布,对齐类型分布(用于基于长度的方法)和关联频率在具有不同样式的文本之间存在显着差异,因此当在不同样式的语料库中进行测试时,基于长度的方法无法实现相似的性能。实验表明,如果使用基于长度的方法通过技术手册进行培训,然后在一般杂志上进行测试,则F量度的效果可能会从98.2%下降至85.6%。由于源文本中很大一部分的内容词将被翻译成相应的翻译对偶以保留目标文本中的含义,因此当人工执行对齐任务时,传输词典通常被视为对齐句子的更可靠线索。为了提高鲁棒性,本文提出了一种基于转移词典和句子长度的鲁棒统计模型。将转移词典集成到模型中后,观察到F-measure误差减少了60%(从14.4%降低到5.8%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号