首页> 外国专利> Selection of a representative data subset of a set of unstructured data

Selection of a representative data subset of a set of unstructured data

机译:选择一组非结构化数据的代表性数据子集

摘要

Embodiments are directed towards generating a representative sampling as a subset from a larger dataset that includes unstructured data. A graphical user interface enables a user to provide various data selection parameters, including specifying a data source and one or more subset types desired, including one or more of latest records, earliest records, diverse records, outlier records, and/or random records. Diverse and/or outlier subset types may be obtained by generating clusters from an initial selection of records obtained from the larger dataset. An iteration analysis is performed to determine whether a sufficient number of clusters and/or cluster types have been generated that exceed at least one threshold and when not exceeded, additional clustering is performed on additional records. From the resultant clusters, and/or other subtype results, a subset of records is obtained as the representative sampling subset.
机译:实施例涉及从包括非结构化数据的较大数据集生成作为子集的代表性采样。 图形用户界面使用户能够提供各种数据选择参数,包括指定所需的数据源和一个或多个子集类型,包括最新记录,最早的记录,不同记录,异常记录和/或随机记录中的一个或多个。 可以通过从从较大数据集获得的初始选择的初始选择来获得各种和/或异常量子集类型。 执行迭代分析以确定是否已经生成超过至少一个阈值和不超过一个阈值并且不超过的群集数量的簇和/或群集类型。 从得到的簇和/或其他子类型结果,获得记录子集作为代表采样子集。

著录项

  • 公开/公告号US11232124B2

    专利类型

  • 公开/公告日2022-01-25

    原文格式PDF

  • 申请/专利权人 SPLUNK INC.;

    申请/专利号US202016751063

  • 发明设计人 R. DAVID CARASSO;MICAH JAMES DELFINO;

    申请日2020-01-23

  • 分类号G06F16/25;G06F16/35;G06F16/28;G06F16/904;G06F7/24;G06F3/0482;G06F3/0484;G06F3/0488;

  • 国家 US

  • 入库时间 2022-08-24 23:30:50

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号