首页> 外国专利> GENERATING COMPUTATIONALLY-EFFICIENT REPRESENTATIONS OF LARGE DATASETS

GENERATING COMPUTATIONALLY-EFFICIENT REPRESENTATIONS OF LARGE DATASETS

机译:生成大数据集的计算有效表示

摘要

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing large datasets using a computationally-efficient representation are disclosed. A request to apply a coverage algorithm to a large input dataset is received. The large dataset includes sets of elements. A computationally-efficient representation of the large dataset is generated by generating a reduced set of elements that contains fewer elements based on a defined probability. For each element in the reduced set, a determination is made regarding whether the element appears in more than a threshold number of sets. When the element appears in more than the threshold number, the element is removed from sets until the element appears in only the threshold number. The coverage algorithm is then applied to the computationally-efficient representation to identify a subset of the sets. The system provides data identifying the subset of the sets in response to the received request.
机译:公开了用于使用计算有效表示来处理大型数据集的方法,系统和装置,包括在计算机存储介质上编码的计算机程序。接收到将覆盖算法应用于大型输入数据集的请求。大型数据集包含元素集。通过基于定义的概率生成包含较少元素的元素减少集合来生成大型数据集的计算有效表示。对于缩小集合中的每个元素,确定该元素是否出现在超过阈值数量的集合中。当元素出现的数量超过阈值数量时,将从集合中删除该元素,直到该元素仅出现在阈值数量中。然后将覆盖算法应用于计算效率高的表示形式,以识别集合的子集。该系统响应于接收到的请求提供标识集合的子集的数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号