首页> 外国专利> USE OF SMALL UNIT LANGUAGE MODEL FOR TRAINING LARGE UNIT LANGUAGE MODELS

USE OF SMALL UNIT LANGUAGE MODEL FOR TRAINING LARGE UNIT LANGUAGE MODELS

机译:使用小单元语言模型训练大单元语言模型

摘要

A computer-implemented method, computer program product, and apparatus are provided. The method includes generating a plurality of sequences of small unit tokens from a first language model that is trained with a small unit corpus including the small unit tokens, the small unit corpus having been derived by tokenization with a small unit. The method further includes tokenizing the plurality of sequences of small unit tokens by a large unit that is larger than the small unit, to create a derived large unit corpus including derived large unit tokens.
机译:提供了一种计算机实现的方法,计算机程序产品和装置。该方法包括从第一语言模型生成多个小单位令牌序列,该第一语言模型是用包括小单位令牌的小单位语料库训练的,该小单位语料库是通过用小单位进行标记化而得出的。该方法还包括:通过比小单元大的大单元来对小单元令牌的多个序列进行令牌化,以创建包括导出的大单元令牌的导出的大单元语料库。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号