首页> 外国专利> METHOD AND APPARATUS FOR SCALABLE PROBABILISTIC CLUSTERING USING DECISION TREES

METHOD AND APPARATUS FOR SCALABLE PROBABILISTIC CLUSTERING USING DECISION TREES

机译:决策树的可扩展概率聚类方法和装置

摘要

Some embodiments of the invention include methods for identifying clusters in a database, data warehouse or data mart. The identified clusters can be meaningfully understood by a list of the attributes and corresponding values for each of the clusters. Some embodiments of the invention include a method for scalable probabilistic clustering using a decision tree. Some embodiments of the invention, perform linearly in the size of the set of data and only require a single access to the set of data. Some embodiments of the invention produce interpretable clusters that can be described in terms of a set of attributes and attribute values for that set of attributes. In some embodiments, the cluster can be interpreted by reading the attribute values and attributes on the path from the root node of the decision tree to the node of the decision tree corresponding to the cluster. In some embodiments, it is not necessary for there to be a domain specific distance function for the attributes. In some embodiments, a cluster is determined by identifying an attribute with the highest influence on the distribution of the other attributes. Each of the values assumed by the identified attribute corresponds to a cluster, and a node in the decision tree. In some embodiments, the CUBE operation is used to access the set of data a single time and the result is used to computer the influence and other calculations.
机译:本发明的一些实施例包括用于识别数据库,数据仓库或数据集市中的集群的方法。可以通过属性列表和每个群集的相应值来有意义地理解已识别的群集。本发明的一些实施例包括一种用于使用决策树的可伸缩概率聚类的方法。本发明的一些实施例在数据集的大小上线性地执行,并且仅需要对数据集的单次访问。本发明的一些实施例产生可以根据一组属性和该组属性的属性值来描述的可解释簇。在一些实施例中,可以通过读取从决策树的根节点到与该集群相对应的决策树的节点的路径上的属性值和属性来解释集群。在一些实施例中,没有必要为属性设置域特定的距离函数。在一些实施例中,通过识别对其他属性的分布具有最大影响的属性来确定聚类。标识的属性所假定的每个值都对应一个群集以及决策树中的一个节点。在一些实施例中,多维数据集操作用于单次访问数据集,结果用于计算机影响和其他计算。

著录项

  • 公开/公告号EP1145184A3

    专利类型

  • 公开/公告日2002-04-03

    原文格式PDF

  • 申请/专利权人 E.PIPHANY INC.;

    申请/专利号EP20000926484

  • 发明设计人 SAHAMI MEHRAN;JOHN GEORGE H.;

    申请日2000-04-28

  • 分类号G06K9/00;

  • 国家 EP

  • 入库时间 2022-08-22 00:34:26

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号