首页> 外国专利> Method for building parallel corpora

Method for building parallel corpora

机译:建立并行语料库的方法

摘要

A method for identifying documents for enriching a statistical translation tool includes retrieving a source document which is responsive to a source language query that may be specific to a selected domain. A set of text segments is extracted from the retrieved source document and translated into corresponding target language segments with a statistical translation tool to be enriched. Target language queries based on the target language segments are formulated. Sets of target documents responsive to the target language queries are retrieved. The sets of retrieved target documents are filtered, including identifying any candidate documents which meet a selection criterion that is based on co-occurrence of a document in a plurality of the sets. The candidate documents, where found, are compared with the retrieved source document for determining whether any of the candidate documents match the source document. Matching documents can then be stored and used at their turn in a training phase for enriching the translation tool.
机译:一种用于识别用于丰富统计翻译工具的文档的方法,包括检索对可能特定于所选域的源语言查询作出响应的源文档。从检索到的源文档中提取一组文本片段,并使用要丰富的统计翻译工具将其翻译成相应的目标语言片段。制定基于目标语言段的目标语言查询。检索响应于目标语言查询的目标文档集。过滤检索到的目标文档的集合,包括识别满足选择标准的任何候选文档,该选择标准基于多个集合中文档的同时出现。将找到的候选文档与检索到的源文档进行比较,以确定是否有任何候选文档与源文档匹配。然后可以在训练阶段存储和使用匹配的文档,以丰富翻译工具。

著录项

  • 公开/公告号US7949514B2

    专利类型

  • 公开/公告日2011-05-24

    原文格式PDF

  • 申请/专利权人 FRANCOIS PACULL;

    申请/专利号US20070789089

  • 发明设计人 FRANCOIS PACULL;

    申请日2007-04-20

  • 分类号G06F17/28;

  • 国家 US

  • 入库时间 2022-08-21 18:09:29

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号