首页> 外国专利> Automatic discovery of relevant data in massive datasets

Automatic discovery of relevant data in massive datasets

机译:自动发现海量数据集中的相关数据

摘要

An approach for discovery of relevant data in massive datasets. Compare datasets including compare key fields, compare data fields and a core dataset including target data field(s) and core field(s) are received. The compare datasets are categorized into direct and indirect related dataset pools based on the target data field(s) correlation strength with matching compare and core fields. The direct related dataset pool and the core dataset are transformed into reduction datasets based on statistical measure of values of target data fields, shared key fields and compare data fields. Target correlations of the reduction datasets are creating based on a reduction compare and target data fields. Statistical relationship strength of core dataset and the direct related dataset pool are created based on a statistical mean of target correlations and a relevancy data store is created.
机译:一种在大量数据集中发现相关数据的方法。接收包括比较关键字字段的比较数据集,比较数据字段和包括目标数据字段和核心字段的核心数据集。根据目标数据字段的相关强度以及匹配的比较字段和核心字段,将比较数据集分为直接和间接相关的数据集库。基于目标数据字段,共享键字段和比较数据字段的值的统计度量,将直接相关的数据集库和核心数据集转换为归约数据集。约简数据集的目标相关性是基于约简比较和目标数据字段创建的。基于目标相关性的统计平均值创建核心数据集和直接相关数据集池的统计关系强度,并创建相关性数据存储。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号