Intra-feature Random Forest Clustering

机译：功能内随机森林聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering algorithms are commonly used to find structure in data without explicitly being told what they are looking for. One key desideratum of a clustering algorithm is that the clusters it identifies given some set of features will generalize well to features that have not been measured. Yeung et al. (2001) introduce a Figure of Merit closely aligned to this desideratum, which they use to evaluate clustering algorithms. Broadly, the Figure of Merit measures the within-cluster variance of features of the data that were not available to the clustering algorithm. Using this metric, Yeung et al. found no clustering algorithms that reliably outperformed k-means on a suite of real world datasets (Yeung et al. 2001). This paper presents a novel clustering algorithm, intra-feature random forest clustering (IRFC), that does outperform k-means on a variety of real world datasets per this metric. IRFC begins by training an ensemble of decision trees of limited depth to predict randomly selected features given the remaining features. It then aggregates the partitions that are implied by these trees, and outputs however many clusters are specified by an input parameter.

机译：聚类算法通常用于在数据中查找结构，而无需明确告知它们在寻找什么。聚类算法的一个关键要求是，它在给定的一组特征下识别出的聚类将很好地推广到尚未测量的特征。 Yeung等。（2001年）介绍了一个与该目标密切相关的品质因数图，他们将其用于评估聚类算法。广义而言，品质因数衡量的是聚类算法不可用的数据特征的簇内差异。 Yeung等使用此指标。在真实世界的数据集上，没有发现能可靠地胜过k均值的聚类算法（Yeung等，2001）。本文提出了一种新颖的聚类算法，即功能内随机森林聚类（IRFC），该算法在此指标上的表现优于各种现实数据集上的k均值。 IRFC首先训练一组深度有限的决策树，以在给定其余特征的情况下预测随机选择的特征。然后，它聚合这些树所隐含的分区，并输出，但是输入参数指定了许多群集。

著录项

来源
《International workshop on machine learning, optimization, and big data》|2017年|41-49|共9页
会议地点 Volterra(IT)
作者
Michael Cohen;
展开▼
作者单位

Galvanize San Francisco CA USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Cluster analysis; Random forest; Unsupervised learning Ensemble; Figure of Merit;

机译：聚类分析;随机森林；无监督学习乐团；功绩图;

相似文献

外文文献
中文文献
专利

1. Polarimetric SAR Image Classification Using Multifeatures Combination and Extremely Randomized Clustering Forests [J] . Tongyuan Zou., Wen Yang., Dengxin Dai., EURASIP journal on advances in signal processing . 2010,第6期

机译：基于多特征组合和极端随机聚类森林的极化SAR图像分类
2. Polarimetric SAR Image Classification Using Multifeatures Combination and Extremely Randomized Clustering Forests [J] . Tongyuan Zou, Wen Yang, Dengxin Dai, EURASIP journal on advances in signal processing . 2009,第1期

机译：基于多特征组合和极端随机聚类森林的极化SAR图像分类
3. Randomized Clustering Forests for Image Classification [J] . Moosmann Frank, Nowak Eric, Jurie Frederic IEEE Transactions on Pattern Analysis and Machine Intelligence . 2008,第9期

机译：用于图像分类的随机聚类森林
4. Intra-feature Random Forest Clustering [C] . Michael Cohen International Workshop on Machine Learning, Optimization, and Big Data . 2018

机译：特征内的随机森林聚类
5. Challenges in the Ethical Conduct and Ethics Review of Cluster Randomized Trials: A Survey of Cluster Randomization Trialists. [D] . Chaudhry, Shazia Hira. 2012

机译：聚类随机试验的道德行为和伦理审查中的挑战：聚类随机试验者的调查。
6. Stepped-Wedge Cluster Randomised Trial of Social Prescribing of Forest Therapy for Quality of Life and Biopsychosocial Wellbeing in Community-Living Australian Adults with Mental Illness: Protocol [O] . Tamsin Thomas, James Baker, Debbie Massey, 2020

机译：阶梯式楔形集群随机试验对森林治疗的社会疗法与精神疾病社区生活澳大利亚成年人的生命和生物心社会福祉：议定书
7. BiMM forest: A random forest method for modeling clustered and longitudinal binary outcomes [O] . Jaime Lynn Speiser, Bethany J. Wolf, Dongjun Chung, 2019

机译：BIMM森林：一种用于建模聚类和纵向二元成果的随机林法

Intra-feature Random Forest Clustering

摘要

著录项

相似文献

相关主题

期刊订阅