首页> 外国专利> Dataset connector and crawler to identify data lineage and segment data

Dataset connector and crawler to identify data lineage and segment data

机译:数据集连接器和搜寻器,用于识别数据沿袭和细分数据

摘要

Systems and methods for connecting datasets are disclosed. For example, a system may include a memory unit storing instructions and a processor configured to execute the instructions to perform operations. The operations may include receiving a plurality of datasets and a request to identify a cluster of connected datasets among the received plurality of datasets. The operations may include selecting a dataset. In some embodiments, the operations include identifying a data schema of the selected dataset and determining a statistical metric of the selected dataset. The operations may include identifying foreign key scores. The operations may include generating a plurality of edges between the datasets based on the foreign key scores, the data schema, and the statistical metric. The operations may include segmenting and returning datasets based on the plurality of edges.
机译:公开了用于连接数据集的系统和方法。例如,系统可以包括:存储单元,其存储指令;以及处理器,其被配置为执行指令以执行操作。所述操作可以包括:接收多个数据集;以及在所接收的多个数据集之中标识连接的数据集的集群的请求。该操作可以包括选择数据集。在一些实施例中,所述操作包括识别所选择的数据集的数据模式并确定所选择的数据集的统计度量。该操作可以包括识别外键分数。所述操作可以包括基于外键得分,数据模式和统计度量在数据集之间生成多个边缘。所述操作可以包括基于所述多个边缘分割和返回数据集。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号