首页> 外国专利> Automatically Inferring Data Relationships of Datasets

Automatically Inferring Data Relationships of Datasets

机译:自动推断数据集的数据关系

摘要

Described herein is a system and method for inferring data relationships of a plurality of datasets. Data contents (and optionally metadata) of the plurality of datasets are scanned to extract features of each of the datasets. Features can be related to a structure of data, a profile of data within the dataset, and/or metadata of the dataset. Each feature has an associated weight. The datasets can be clustered into clusters based on at least some of the weighted features (e.g., based on a sim-hash or min-hash of the dataset). A precise similarity metric is computed between datasets in each cluster based on their weighted features. Datasets with precise similarity metrics above a threshold quantity are inferred to be being likely related. Information is provided regarding the inferred likely related datasets.
机译:本文描述了一种用于推断多个数据集的数据关系的系统和方法。扫描多个数据集的数据内容(和可选的元数据)以提取每个数据集的特征。特征可以与数据的结构,数据集中的数据的概况和/或数据集的元数据有关。每个功能都有关联的权重。可以基于至少一些加权特征(例如,基于数据集的sim-hash或min-hash)将数据集聚类为群集。根据每个聚类的加权特征,在每个数据集之间计算精确的相似性度量。推断出具有高于阈值数量的精确相似性度量的数据集很可能是相关的。提供有关推断的可能相关数据集的信息。

著录项

  • 公开/公告号US2020278986A1

    专利类型

  • 公开/公告日2020-09-03

    原文格式PDF

  • 申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;

    申请/专利号US201916289719

  • 发明设计人 SAIKAT GUHA;GARY KYLE SOELLER;

    申请日2019-03-01

  • 分类号G06F16/28;G06N5/04;

  • 国家 US

  • 入库时间 2022-08-21 11:21:40

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号