首页> 外国专利> METHOD AND APPARATUS FOR EXPANDING DATA OF BILINGUAL CORPUS AND STORAGE MEDIUM

METHOD AND APPARATUS FOR EXPANDING DATA OF BILINGUAL CORPUS AND STORAGE MEDIUM

机译:双语语料库和存储介质的数据扩展方法和装置

摘要

A method and apparatus for expanding data in a bilingual corpus is disclosed. The method of expanding data of a bilingual corpus comprises: querying at least one first central language phrase matching a word of a first source language phrase in a source language-centric language corpus; Querying at least one second source language phrase matching a word of each first central language phrase in a source language-backbone language corpus and constructing a source language phrase set with each second source language phrase; Querying at least one first target language phrase matching a word of each first central language phrase in a central language-target language corpus and constructing a target language phrase set with each first target language phrase; Forming at least one pair of phrases in which a source language phrase and a target language phrase are matched by combining a second source language phrase in the source language phrase set and a first target language phrase in the target language phrase set; And storing at least one pair of phrases in the source language-target language corpus where the phrases of the source language phrase and the target language phrase are matched. The problem of data scarcity in the bilingual corpus is solved by expanding the data in the bilingual corpus.
机译:公开了一种用于扩展双语语料库中的数据的方法和设备。扩展双语语料库数据的方法,包括:查询与以源语言为中心的语言语料库中的第一源语言短语的单词匹配的至少一个第一中心语言短语;查询至少一个与源语言骨干语言语料库中的每个第一中心语言短语的单词匹配的第二源语言短语,并与每个第二源语言短语一起构建源语言短语集合;查询中心语言目标语言语料库中与每个第一中心语言短语的单词匹配的至少一个第一目标语言短语,并与每个第一目标语言短语建立目标语言短语集合;通过组合源语言短语集中的第二源语言短语和目标语言短语集中的第一目标语言短语,形成至少一对短语,其中,源语言短语和目标语言短语匹配;并且在源语言目标语言语料库中存储至少一对短语,其中源语言短语和目标语言短语的短语匹配。通过扩展双语语料库中的数据,解决了双语语料库中数据稀缺的问题。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号