首页>
外国专利>
GENERATING OVERLAP ESTIMATIONS BETWEEN HIGH-VOLUME DIGITAL DATA SETS BASED ON MULTIPLE SKETCH VECTOR SIMILARITY ESTIMATORS
GENERATING OVERLAP ESTIMATIONS BETWEEN HIGH-VOLUME DIGITAL DATA SETS BASED ON MULTIPLE SKETCH VECTOR SIMILARITY ESTIMATORS
展开▼
机译:基于多草图向量相似性估计器的大容量数字数据集重叠估计生成
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present disclosure relates to systems, methods, and non-transitory computer-readable media that estimate the overlap between sets of data samples. In particular, in one or more embodiments, the disclosed systems utilize a sketch-based sampling routine and a flexible, accurate estimator to determine the overlap (e.g., the intersection) between sets of data samples. For example, in some implementations, the disclosed systems generate a sketch vector—such as a one permutation hashing vector—for each set of data samples. The disclosed systems further compare the sketch vectors to determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator. The disclosed systems utilize one or more of the determined similarity estimators in generating an overlap estimation for the sets of data samples.
展开▼