首页> 外国专利> METHOD AND DEVICE FOR GENERATING TRAINING DATA FOR TRAINING STATISTICAL MACHINE TRANSLATION DEVICE, PARAPHRASE DEVICE, METHOD FOR TRAINING THE SAME, AND DATA PROCESSING SYSTEM AND COMPUTER PROGRAM FOR THE METHOD

METHOD AND DEVICE FOR GENERATING TRAINING DATA FOR TRAINING STATISTICAL MACHINE TRANSLATION DEVICE, PARAPHRASE DEVICE, METHOD FOR TRAINING THE SAME, AND DATA PROCESSING SYSTEM AND COMPUTER PROGRAM FOR THE METHOD

机译:生成用于训练统计机器翻译设备,paraparaphase设备,用于训练该方法的数据的方法和数据处理系统以及计算机程序的方法和设备

摘要

PPROBLEM TO BE SOLVED: To provide a method for shortening a sentence without omission of information. PSOLUTION: The method for generating training data for training statistical machine translation 28 is provided with a step for preparing a corpus including a plurality of sentences of a prescribed language, a step for clustering a similar sentence in the corpus 12 into a plurality of clusters 16, a step 18 for selecting the cluster of a particle size, which is selected from a plurality of the clusters 16, a step 18 for selecting one sentence in a length satisfying prescribed standard in the respective clusters of the selected particle size, and a step 18 for making each of sentences and one selected sentence into a pair in the respective clusters of the selected particle size. PCOPYRIGHT: (C)2004,JPO&NCIPI
机译:

要解决的问题:提供一种在不遗漏信息的情况下缩短句子的方法。解决方案:用于生成用于训练统计机器翻译的训练数据的方法28设有用于准备包括多个规定语言的句子的语料库的步骤,用于将语料库12中的相似句子聚类为多个的步骤。在簇16中,从多个簇16中选择的用于选择粒径的簇的步骤18,在所选择的粒径的各个簇中,选择满足规定标准的长度的一个句子的步骤18,步骤18,用于使所选句子大小的各个簇中的每个句子和一个所选句子成对。

版权:(C)2004,日本特许厅和日本国家唱片公司

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号