首页> 外国专利> GENERATING OVERLAP ESTIMATIONS BETWEEN HIGH-VOLUME DIGITAL DATA SETS BASED ON MULTIPLE SKETCH VECTOR SIMILARITY ESTIMATORS

GENERATING OVERLAP ESTIMATIONS BETWEEN HIGH-VOLUME DIGITAL DATA SETS BASED ON MULTIPLE SKETCH VECTOR SIMILARITY ESTIMATORS

机译:基于多草图向量相似性估计器的大容量数字数据集重叠估计生成

摘要

The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.
机译:本发明涉及估计数据样本集之间的重叠的系统、方法和非暂时性计算机可读介质。具体而言,在一个或多个实施例中,所公开的系统利用基于草图的采样例程和灵活、准确的估计器来确定数据样本集之间的重叠(例如,交集)。例如,在一些实现中,所公开的系统为每组数据样本生成草图向量,例如一个置换散列向量。所公开的系统进一步比较草图向量以确定等位相似性估计器、较小位相似性估计器和较大位相似性估计器。所公开的系统利用一个或多个确定的相似性估计器来生成数据样本集的重叠估计。

著录项

  • 公开/公告号US2022138218A1

    专利类型

  • 公开/公告日2022-05-05

    原文格式PDF

  • 申请/专利权人 ADOBE INC.;

    申请/专利号US202017090556

  • 发明设计人 ANUP RAO;TUNG MAI;MATVEY KAPILEVICH;

    申请日2020-11-05

  • 分类号G06F16/26;G06F16/28;G06T11/20;

  • 国家 US

  • 入库时间 2022-08-25 00:49:57

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号