首页> 美国政府科技报告 >Scale-Independent Clustering Method with Automatic Variable Selection Based on Trees
【24h】

Scale-Independent Clustering Method with Automatic Variable Selection Based on Trees

机译:基于树的自动变量选择与尺度无关的聚类方法

获取原文

摘要

Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that uses one of three new distance metrics. In a dataset with p variables, we create p trees, one with each variable as the response. Distance is measured by determining on which leaf an observation falls in each tree. Two observations are similar if they tend to fall on the same leaf and dissimilar if they are usually on different leaves. The distance metrics are not affected by scaling or transformations of the variables and easily determine distances in datasets with both continuous and categorical variables. This method is tested on several well-known datasets, both with and without added noise variables, and performs very well in the presence of noise due in part to automatic variable selection. The new distance metrics outperform several existing clustering methods in a large number of scenarios.

著录项

  • 作者

    Lynch, S K;

  • 作者单位
  • 年度 2014
  • 页码 1-49
  • 总页数 49
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 工业技术;
  • 关键词

  • 入库时间 2022-08-29 10:47:54

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号