首页> 外文会议>Advances in machine learning >Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes
【24h】

Building a Decision Cluster Forest Model to Classify High Dimensional Data with Multi-classes

机译:建立决策集群森林模型以对具有多个类别的高维数据进行分类

获取原文
获取原文并翻译 | 示例

摘要

In this paper, a decision cluster forest classification model is proposed for high dimensional data with multiple classes. A decision cluster forest (DCF) consists of a set of decision cluster trees, in which the leaves of each tree are clusters labeled with the same class that determines the class of new objects falling in the clusters. By recursively calling a variable weighting k-means algorithm, a decision cluster tree can be generated from a subset of the training data that contains the objects in the same class. The set of m decision cluster trees grown from the subsets of m classes constitute the decision cluster forest. Anderson-Darling test is used to determine the stopping condition of tree growing. A DCF classification (DCFC) model is selected from all leaves of the m decision cluster trees in the forest. A series of experiments on both synthetic and real data sets have shown that the DCFC model performed better in accuracy and scalability than the single decision cluster tree method and the methods of k-NN, decision tree and SVM. This new model is particularly suitable for large, high dimensional data with many classes.
机译:本文针对多类高维数据提出了决策簇森林分类模型。决策集群林(DCF)由一组决策集群树组成,其中每棵树的叶子都是标有相同类别的集群,这些集群确定属于该集群的新对象的类别。通过递归调用可变权重k均值算法,可以从包含相同类的对象的训练数据的子集生成决策簇树。从m个类的子集中生长的m个决策簇树的集合构成了决策簇森林。 Anderson-Darling检验用于确定树木生长的停止条件。从森林中m个决策簇树的所有叶子中选择一个DCF分类(DCFC)模型。在综合和真实数据集上进行的一系列实验表明,DCFC模型在准确性和可扩展性方面比单决策簇树方法和k-NN,决策树和SVM方法更好。这种新模型特别适用于具有许多类的大型,高维数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号