首页> 外文期刊>Statistics and Its Interface >Iterative subsampling in solution path clustering of noisy big data
【24h】

Iterative subsampling in solution path clustering of noisy big data

机译:嘈杂的大数据的解决方案路径聚类中的迭代子采样

获取原文
获取原文并翻译 | 示例
       

摘要

We develop an iterative subsampling approach to improve the computational efficiency of our previous work on solution path clustering (SPC). The SPC method achieves clustering by concave regularization on the pairwise distances between cluster centers. This clustering method has the important capability to recognize noise and to provide a short path of clustering solutions; however, it is not sufficiently fast for big datasets. Thus, we propose a method that iterates between clustering a small subsample of the full data and sequentially assigning the other data points to attain orders of magnitude of computational savings. The new method preserves the ability to isolate noise, includes a solution selection mechanism that ultimately provides one clustering solution with an estimated number of clusters, and is shown to be able to extract small tight clusters from noisy data. The method's relatively minor losses in accuracy are demonstrated through simulation studies, and its ability to handle large datasets is illustrated through applications to gene expression datasets. An R package, SPClustering, for the SPC method with iterative subsampling is available at http://www.stat.ucla.edu/similar to zhou/Software.html.
机译:我们开发了一种迭代子采样方法来提高我们先前在解决方案路径聚类(SPC)上的工作的计算效率。 SPC方法通过对聚类中心之间的成对距离进行凹正则化来实现聚类。这种聚类方法具有识别噪声和提供短路径聚类解决方案的重要能力。但是,对于大型数据集,这还不够快。因此,我们提出了一种在聚类完整数据的较小子样本与依次分配其他数据点以获得数量级的计算节省之间进行迭代的方法。该新方法保留了隔离噪声的能力,包括一种解决方案选择机制,该机制最终提供了一个带有估计数目的聚类的聚类解决方案,并被证明能够从嘈杂的数据中提取较小的紧密聚类。通过仿真研究证明了该方法准确性的相对较小损失,并且通过应用于基因表达数据集说明了该方法处理大型数据集的能力。有关带有迭代子采样的SPC方法的R包SPClustering,可从http://www.stat.ucla.edu/like zhou / Software.html获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号