...
首页> 外文期刊>Journal of Statistical Software >RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm
【24h】

RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm

机译:RSKC:鲁棒且稀疏的K均值聚类算法的R包

获取原文
           

摘要

Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.
机译:Witten和Tibshirani(2010)提出了一种算法,用于同时查找聚类并选择聚类变量,称为稀疏K均值(SK-means)。当数据集具有很大一部分噪声变量(即,没有有用信息来分离聚类的变量)时,SK-means尤其有用。 SK-means在干净和完整的数据上效果很好,但不能处理异常值或丢失数据。为了解决这些问题,我们引入了在R包RSKC中实现的新的健壮且稀疏的K均值聚类算法。我们演示了如何在四个数据集上使用我们的包。我们还进行了蒙特卡洛研究,比较RSK-means和SK-means在选择重要变量和识别聚类方面的表现。我们的模拟研究表明,RSK-means在纯净数据上表现良好,并且在异常值受污染的数据上优于SK-means和其他竞争对手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号