首页> 外国专利> CORPUS SELECTION DEVICE, CORPUS SELECTION METHOD, AND PROGRAM

CORPUS SELECTION DEVICE, CORPUS SELECTION METHOD, AND PROGRAM

机译:语料库选择设备,语料库选择方法和程序

摘要

PROBLEM TO BE SOLVED: To provide a corpus selection device, a corpus selection method, and a program which select a learning corpus capable of achieving both the improvement in quality of a language model and the reduction in capacity for use of a storage area.;SOLUTION: A corpus selection device AA divides a learning corpus (whole) into learning corpuses (subset 1 to subset 3), and subset language models 1 to 3 corresponding to the learning corpuses (subsets 1 to 3) respectively are generated by language modeling. With respect to respective subset language models 1 to 3, perplexities are calculated using a task representation corpus to obtain perplexity-1 to perplexity-Y. Learning corpuses corresponding to subset language model having lower perplexities are removed from the learning corpus(whole) to select a learning corpus (selected).;COPYRIGHT: (C)2012,JPO&INPIT
机译:解决的问题:提供一种语料库选择设备,语料库选择方法和程序,该程序选择能够同时提高语言模型的质量和减少存储区域使用能力的学习语料库。解决方案:语料库选择设备AA将学习语料库(整个)划分为学习语料库(子集1至子集3),并分别通过语言建模生成与学习语料库(子集1-3)相对应的子集语言模型1-3。对于各个子语言模型1至3,使用任务表示语料来计算困惑度以获得困惑度-1至困惑度Y。从整个学习语料库中删除与较低困惑度的子集语言模型相对应的学习语料库,以选择一个学习语料库(选定)。版权所有:(C)2012,JPO&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号