首页>
外国专利>
AUTOMATIC JOINING OF DATA SETS BASED ON STATISTICS OF FIELD VALUES IN THE DATA SETS
AUTOMATIC JOINING OF DATA SETS BASED ON STATISTICS OF FIELD VALUES IN THE DATA SETS
展开▼
机译:基于数据集中字段值统计信息的数据集自动联接
展开▼
页面导航
摘要
著录项
相似文献
摘要
A computer system processes arbitrary data sets to identify fields of data that can be the basis of a join operation. Each data set has a plurality of entries, with each entry having a plurality of fields. For each pair of data sets, the computer system compares the values of fields in a first data set in the pair of data sets to the values of fields in a second data set in the pair of data sets, to identify fields having substantially similar sets of values. Given pairs of fields that have similar sets of values, the computer system measures entropy with respect to an intersection of the sets of values of the pair of fields. The computer system can recommend fields for a join operation between any pair of data sets in the plurality of data sets based on such statistical measures.
展开▼