首页> 外国专利> METHODS FOR AUTOMATIC GENERATION OF PARALLEL CORPORA

METHODS FOR AUTOMATIC GENERATION OF PARALLEL CORPORA

机译:自动生成并行公司的方法

摘要

A method of forming parallel corpora comprises receiving sets of items in first language and second languages, each of the sets having one or more associated descriptions and metadata. The metadata is collected from the two sets of items and are aligned using the metadata. The aligned metadata are mapped from the first language to the second language for each of the sets. The descriptions of two items are fetched and the structural similarity of the descriptions is measured to assess whether two items are likely to be translations of each other. For mapped items with structurally similar descriptions, the mapped item descriptions are formed into respective sentences in first language and in the second language. The sentences are parallel corpora which may be used to translate an item from the first language to the second language, and also to train a machine translation system.
机译:一种形成并行语料库的方法,包括接收第一语言和第二语言的项目集合,每个集合具有一个或多个相关联的描述和元数据。元数据是从两组项目中收集的,并使用元数据进行对齐。对于每个集合,将对齐的元数据从第一语言映射到第二语言。获取两个项目的描述,并测量描述的结构相似性,以评估两个项目是否很可能是彼此的翻译。对于具有在结构上相似的描述的映射项,将映射项描述以第一语言和第二语言形成为相应的句子。句子是并行语料库,其可以用于将项目从第一语言翻译为第二语言,并且还可以训练机器翻译系统。

著录项

  • 公开/公告号US2018253421A1

    专利类型

  • 公开/公告日2018-09-06

    原文格式PDF

  • 申请/专利权人 PAYPAL INC.;

    申请/专利号US201815884336

  • 申请日2018-01-30

  • 分类号G06F17/28;G06Q30/06;

  • 国家 US

  • 入库时间 2022-08-21 12:55:58

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号