首页> 外国专利> METHOD AND APPARATUS FOR FINDING CLUSTER IN DATA STREAM AS INFINITE DATA SET HAVING DATA OBJECTS TO BE CONTINUOUSLY GENERATED

METHOD AND APPARATUS FOR FINDING CLUSTER IN DATA STREAM AS INFINITE DATA SET HAVING DATA OBJECTS TO BE CONTINUOUSLY GENERATED

机译:查找具有连续生成数据对象的无限数据集的数据流中的簇的方法和装置

摘要

Disclosed is a method and apparatus for finding a cluster in a data stream as an infinite data set having data elements, which are continuously generated. A method of finding a cluster in a data stream according to an embodiment of the present invention includes the steps of: (a) updating statistical distribution information of a grid-cell corresponding to a currently generated data element among the grid-cells, statistical distribution information on previously generated data elements being managed using grid-cells, which are partitioned within the range of a data space and have statistical distribution information of data elements within the range; (b) comparing the occurrence frequency of the data element in the grid-cell according to the update result of the statistical distribution information with a predefined partitioning threshold, partitioning the grid-cell into a plurality of grid-cells according to the comparison result, and estimating statistical distribution information of the partitioned grid-cells; (c) recursively performing the step (a) or (b) until the grid-cell becomes a unit grid-cell having a predefined size; and (d) comparing the occurrence frequency of a data element in the unit grid-cell with a predefined minimum support and defining a set of a plurality of unit grid-cells as a cluster according to the comparison result.
机译:公开了一种用于在数据流中找到作为具有连续生成的数据元素的无限数据集的簇的方法和设备。根据本发明的实施例的在数据流中查找集群的方法包括以下步骤:(a)更新与网格单元中的当前生成的数据元素相对应的网格单元的统计分布信息,统计分布关于使用网格单元管理的先前生成的数据元素的信息,该网格元素在数据空间的范围内被划分并且具有该范围内的数据元素的统计分布信息; (b)根据统计分布信息的更新结果,将网格单元中数据元素的出现频率与预定的划分阈值进行比较,根据比较结果将网格单元划分为多个网格单元,估计分区后的网格单元的统计分布信息; (c)递归地执行步骤(a)或(b),直到网格单元变为具有预定大小的单元网格单元为止; (d)将单元格单元中数据元素的出现频率与预定的最小支持进行比较,并根据比较结果将多个单元格单元的集合定义为簇。

著录项

  • 公开/公告号US2009112514A1

    专利类型

  • 公开/公告日2009-04-30

    原文格式PDF

  • 申请/专利权人 WON-SUK LEE;

    申请/专利号US20080038649

  • 发明设计人 WON-SUK LEE;

    申请日2008-02-27

  • 分类号G06F17/18;

  • 国家 US

  • 入库时间 2022-08-21 19:33:38

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号