首页> 外国专利> Automatically Inferring Data Relationships of Datasets

Automatically Inferring Data Relationships of Datasets

机译：自动推断数据集的数据关系

页面导航

摘要
著录项
相似文献

摘要

Described herein is a system and method for inferring data relationships of a plurality of datasets. Data contents (and optionally metadata) of the plurality of datasets are scanned to extract features of each of the datasets. Features can be related to a structure of data, a profile of data within the dataset, and/or metadata of the dataset. Each feature has an associated weight. The datasets can be clustered into clusters based on at least some of the weighted features (e.g., based on a sim-hash or min-hash of the dataset). A precise similarity metric is computed between datasets in each cluster based on their weighted features. Datasets with precise similarity metrics above a threshold quantity are inferred to be being likely related. Information is provided regarding the inferred likely related datasets.

机译：本文描述了一种用于推断多个数据集的数据关系的系统和方法。扫描多个数据集的数据内容（和可选的元数据）以提取每个数据集的特征。特征可以与数据的结构，数据集中的数据的概况和/或数据集的元数据有关。每个功能都有关联的权重。可以基于至少一些加权特征（例如，基于数据集的sim-hash或min-hash）将数据集聚类为群集。根据每个聚类的加权特征，在每个数据集之间计算精确的相似性度量。推断出具有高于阈值数量的精确相似性度量的数据集很可能是相关的。提供有关推断的可能相关数据集的信息。

著录项

公开/公告号US2020278986A1

专利类型
公开/公告日2020-09-03

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201916289719
发明设计人 SAIKAT GUHA;GARY KYLE SOELLER;
展开▼

申请日2019-03-01
分类号G06F16/28;G06N5/04;
国家 US
入库时间 2022-08-21 11:21:40

相似文献

专利
外文文献
中文文献