首页> 外国专利> GENERATING COMPUTATIONALLY-EFFICIENT REPRESENTATIONS OF LARGE DATASETS

GENERATING COMPUTATIONALLY-EFFICIENT REPRESENTATIONS OF LARGE DATASETS

机译：生成大数据集的计算有效表示

页面导航

摘要
著录项
相似文献

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing large datasets using a computationally-efficient representation are disclosed. A request to apply a coverage algorithm to a large input dataset is received. The large dataset includes sets of elements. A computationally-efficient representation of the large dataset is generated by generating a reduced set of elements that contains fewer elements based on a defined probability. For each element in the reduced set, a determination is made regarding whether the element appears in more than a threshold number of sets. When the element appears in more than the threshold number, the element is removed from sets until the element appears in only the threshold number. The coverage algorithm is then applied to the computationally-efficient representation to identify a subset of the sets. The system provides data identifying the subset of the sets in response to the received request.

机译：公开了用于使用计算有效表示来处理大型数据集的方法，系统和装置，包括在计算机存储介质上编码的计算机程序。接收到将覆盖算法应用于大型输入数据集的请求。大型数据集包含元素集。通过基于定义的概率生成包含较少元素的元素减少集合来生成大型数据集的计算有效表示。对于缩小集合中的每个元素，确定该元素是否出现在超过阈值数量的集合中。当元素出现的数量超过阈值数量时，将从集合中删除该元素，直到该元素仅出现在阈值数量中。然后将覆盖算法应用于计算效率高的表示形式，以识别集合的子集。该系统响应于接收到的请求提供标识集合的子集的数据。

著录项

公开/公告号US2019026640A1

专利类型
公开/公告日2019-01-24

原文格式PDF
申请/专利权人 GOOGLE LLC;
展开▼

申请/专利号US201816042975
发明设计人 SEYED VAHAB MIRROKNI BANADAKI;HOSSEIN ESFANDIARI;MOHAMMADHOSSEIN BATENI;
展开▼

申请日2018-07-23
分类号G06N7;H04L9/06;G06F9/448;G06N99;G06F17/10;
国家 US
入库时间 2022-08-21 12:05:57

相似文献

专利
外文文献
中文文献