首页> 外国专利> AUTOMATIC JOINING OF DATA SETS BASED ON STATISTICS OF FIELD VALUES IN THE DATA SETS

AUTOMATIC JOINING OF DATA SETS BASED ON STATISTICS OF FIELD VALUES IN THE DATA SETS

机译:基于数据集中字段值统计信息的数据集自动联接

摘要

A computer system processes arbitrary data sets to identify fields of data that can be the basis of a join operation. Each data set has a plurality of entries, with each entry having a plurality of fields. For each pair of data sets, the computer system compares the values of fields in a first data set in the pair of data sets to the values of fields in a second data set in the pair of data sets, to identify fields having substantially similar sets of values. Given pairs of fields that have similar sets of values, the computer system measures entropy with respect to an intersection of the sets of values of the pair of fields. The computer system can recommend fields for a join operation between any pair of data sets in the plurality of data sets based on such statistical measures.
机译:计算机系统处理任意数据集以标识可以作为联接操作基础的数据字段。每个数据集具有多个条目,每个条目具有多个字段。对于每对数据集,计算机系统将数据对对中的第一数据集中的字段值与数据对对中第二数据集中的字段值进行比较,以识别具有基本相似集合的字段价值。给定具有相似值集合的字段对,计算机系统相对于该对字段值集合的交点测量熵。计算机系统可以基于这样的统计量度来推荐用于多个数据集中的任何一对数据集之间的联接操作的字段。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号