首页> 外国专利> Generating computationally-efficient representations of large datasets

Generating computationally-efficient representations of large datasets

机译:生成大型数据集的计算上有效的表示

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing large datasets using a computationally-efficient representation are disclosed. A request to apply a coverage algorithm to a large input dataset is received. The large dataset includes sets of elements. A computationally-efficient representation of the large dataset is generated by generating a reduced set of elements that contains fewer elements based on a defined probability. For each element in the reduced set, a determination is made regarding whether the element appears in more than a threshold number of sets. When the element appears in more than the threshold number, the element is removed from sets until the element appears in only the threshold number. The coverage algorithm is then applied to the computationally-efficient representation to identify a subset of the sets. The system provides data identifying the subset of the sets in response to the received request.
机译:方法,系统和设备包括在计算机存储介质上编码的计算机程序,用于使用计算有效的表示处理大型数据集。 收到将覆盖算法应用于大输入数据集的请求。 大型数据集包括组元素集。 通过生成基于定义概率的较少元素的减少的元素集生成大数据集的计算上有效的表示。 对于减小集中的每个元素,关于元素是否出现在多于阈值数量的确定。 当元素出现在多于阈值编号中时,将从集合中删除元素,直到元素仅在阈值编号中出现。 然后将覆盖算法应用于计算上有效的表示以识别集合的子集。 该系统提供响应于所接收的请求识别集合子集的数据。

著录项

  • 公开/公告号US11238357B2

    专利类型

  • 公开/公告日2022-02-01

    原文格式PDF

  • 申请/专利权人 GOOGLE LLC;

    申请/专利号US201816042975

  • 申请日2018-07-23

  • 分类号G06F7;G06F16/34;G06F16/23;G06F16/24;G06N7;H04L9/06;G06F17/10;G06F9/448;G06N20;G06F8/30;H04L9/32;G06N5/02;G06N5;

  • 国家 US

  • 入库时间 2022-08-24 23:35:02

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号