首页> 外国专利> METHOD OF SELECTING TRAINING TEXT FOR LANGUAGE MODEL, AND METHOD OF TRAINING LANGUAGE MODEL USING THE TRAINING TEXT, AND COMPUTER AND COMPUTER PROGRAM FOR EXECUTING THE METHODS

METHOD OF SELECTING TRAINING TEXT FOR LANGUAGE MODEL, AND METHOD OF TRAINING LANGUAGE MODEL USING THE TRAINING TEXT, AND COMPUTER AND COMPUTER PROGRAM FOR EXECUTING THE METHODS

机译:用于语言模型的训练文本的选择方法,以及使用该训练文本的训练语言模型的方法,以及用于执行该方法的计算机和计算机程序

摘要

Method of selecting training text for language model, and method of training language model using the training text, and computer and computer program for executing the methods. The present invention provides for selecting training text for a language model that includes: generating a template for selecting training text from a corpus in a first domain according to generation techniques of: (i) replacing one or more words in a word string selected from the corpus in the first domain with a special symbol representing any word or word string, and adopting the word string after replacement as a template for selecting the training text; and/or (ii) adopting the word string selected from the corpus in the first domain as the template for selecting the training text; and selecting text covered by the template as the training text from a corpus in a second domain different from the first domain.
机译:为语言模型选择训练文本的方法,使用该训练文本的训练语言模型的方法以及用于执行该方法的计算机和计算机程序。本发明提供选择用于语言模型的训练文本的步骤,该方法包括:根据以下生成技术,生成用于从第一域的语料库选择训练文本的模板:(i)替换从词库中选择的单词串中的一个或多个单词。第一域的语料库,带有表示任何单词或单词串的特殊符号,并采用替换后的单词串作为模板来选择训练文本;和/或(ii)采用从第一领域的语料库中选择的单词串作为模板,以选择训练文本;从不同于第一域的第二域的语料库中选择模板覆盖的文本作为训练文本。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号