首页> 外国专利> Variable representative sampling under resource constraints

Variable representative sampling under resource constraints

机译:资源约束下的可变代表性抽样

摘要

Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
机译:实施例针对于从包括非结构化数据的较大数据集中生成代表性采样作为子集。图形用户界面使用户能够提供各种数据选择参数,包括指定数据源和所需的一个或多个子集类型,包括最新记录,最早记录,不同记录,离群值记录和/或随机记录中的一个或多个。可以通过根据从较大数据集获得的记录的初始选择生成聚类来获得不同和/或离群的子集类型。执行迭代分析以确定是否已生成足够数量的群集和/或群集类型,这些群集和/或群集类型超过至少一个阈值,并且在未超过时,对其他记录执行其他群集。从所得簇和/或其他子类型结果中,获得记录的子集作为代表性采样子集。

著录项

  • 公开/公告号US8751499B1

    专利类型

  • 公开/公告日2014-06-10

    原文格式PDF

  • 申请/专利权人 SPLUNK INC.;

    申请/专利号US201313747153

  • 发明设计人 R. DAVID CARASSO;MICAH JAMES DELFINO;

    申请日2013-01-22

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 15:59:53

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号