首页> 外国专利> Re-sizing data partitions for ensemble models in a mapreduce framework

Re-sizing data partitions for ensemble models in a mapreduce framework

机译:在mapreduce框架中调整整体模型的数据分区的大小

摘要

Techniques are described for revising data partition size for use in generating predictive models. In one example, a method includes determining an initial number of base model partitions of data from a plurality of data sources; determining an initial base model partition size based at least in part on the initial number of base model partitions; and evaluating the initial base model partition size at least in part with reference to at least one base model partition size reference. The method further includes determining a finalized number of base model partitions based at least in part on the initial base model partition size; determining a revised base model partition size; and generating revised base models based at least in part on the revised base model partition size, including using a predictive modeling framework to randomly assign input data records from the plurality of data sources into the base model partitions.
机译:描述了用于修改数据分区大小以用于生成预测模型的技术。在一个示例中,一种方法包括:确定来自多个数据源的数据的基础模型分区的初始数量;至少部分地基于基础模型分区的初始数量,确定初始基础模型分区的大小;至少部分地参考至少一个基本模型分区大小参考来评估初始基本模型分区大小。该方法还包括至少部分地基于初始基础模型分区大小来确定基础模型分区的最终数量;以及确定修改后的基本模型分区大小;至少部分地基于修改后的基本模型分区大小来生成修改后的基本模型,包括使用预测建模框架将来自多个数据源的输入数据记录随机分配到基本模型分区中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号