Iterative subsampling in solution path clustering of noisy big data

Marchetti Yuliya; Zhou Qing

首页> 外文期刊>Statistics and Its Interface >Iterative subsampling in solution path clustering of noisy big data

【24h】

Iterative subsampling in solution path clustering of noisy big data

机译：嘈杂的大数据的解决方案路径聚类中的迭代子采样

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We develop an iterative subsampling approach to improve the computational efficiency of our previous work on solution path clustering (SPC). The SPC method achieves clustering by concave regularization on the pairwise distances between cluster centers. This clustering method has the important capability to recognize noise and to provide a short path of clustering solutions; however, it is not sufficiently fast for big datasets. Thus, we propose a method that iterates between clustering a small subsample of the full data and sequentially assigning the other data points to attain orders of magnitude of computational savings. The new method preserves the ability to isolate noise, includes a solution selection mechanism that ultimately provides one clustering solution with an estimated number of clusters, and is shown to be able to extract small tight clusters from noisy data. The method's relatively minor losses in accuracy are demonstrated through simulation studies, and its ability to handle large datasets is illustrated through applications to gene expression datasets. An R package, SPClustering, for the SPC method with iterative subsampling is available at http://www.stat.ucla.edu/similar to zhou/Software.html.

机译：我们开发了一种迭代子采样方法来提高我们先前在解决方案路径聚类（SPC）上的工作的计算效率。 SPC方法通过对聚类中心之间的成对距离进行凹正则化来实现聚类。这种聚类方法具有识别噪声和提供短路径聚类解决方案的重要能力。但是，对于大型数据集，这还不够快。因此，我们提出了一种在聚类完整数据的较小子样本与依次分配其他数据点以获得数量级的计算节省之间进行迭代的方法。该新方法保留了隔离噪声的能力，包括一种解决方案选择机制，该机制最终提供了一个带有估计数目的聚类的聚类解决方案，并被证明能够从嘈杂的数据中提取较小的紧密聚类。通过仿真研究证明了该方法准确性的相对较小损失，并且通过应用于基因表达数据集说明了该方法处理大型数据集的能力。有关带有迭代子采样的SPC方法的R包SPClustering，可从http://www.stat.ucla.edu/like zhou / Software.html获得。

著录项

来源
《Statistics and Its Interface》 |2016年第4期|共17页
作者
Marchetti Yuliya; Zhou Qing;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类统计学;
关键词
Big data; Clustering; Sparse regularization; Subsampling;

机译：大数据;聚类;稀疏正则化;二次采样;
入库时间 2022-08-18 15:14:19

相似文献

外文文献
中文文献
专利

1. Iterative subsampling in solution path clustering of noisy big data [J] . Marchetti Yuliya, Zhou Qing Statistics and Its Interface . 2016,第4期

机译：嘈杂的大数据的解决方案路径聚类中的迭代子采样
2. A theoretical investigation on moving average filtering solution for fixed-path map matching of noisy position data [J] . Alagoz Baris Baykant, Erturkler Metin, Yeroglu Celaleddin International Journal of Sensor Networks . 2019,第4期

机译：噪声位置数据的固定路径映射匹配的移动平均滤波解决方案的理论研究
3. Subsampling-based acceleration of simple linear iterative clustering for superpixel segmentation [J] . Kang-Sun Choi, Ki-Won Oh Computer vision and image understanding . 2016,第may期

机译：基于子采样的简单线性迭代聚类的超像素分割加速
4. Limited-angle reconstruction from noisy data using clustering of the solution space [C] . Feder, M., Jaffe, . 1989

机译：使用解决方案空间的聚类从噪声数据进行有限角度重构
5. Solution Path Clustering with Minimax Concave Penalty and Its Applications to Noisy Big Data [D] . Marchetti, Yuliya 2014

机译：具有极大极小凹惩罚的解决方案路径聚类及其在嘈杂大数据中的应用
6. PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data [O] . Huiyu Xia, Wei Huang, Ning Li, 2019

机译：PARSUC：基于并行子采样的遥感大数据聚类方法
7. Iterative Subsampling in Solution Path Clustering of Noisy Big Data [O] . Marchetti, Yuliya, Zhou, Qing 2015

机译：噪声大数据解决方案路径聚类中的迭代次采样

Iterative subsampling in solution path clustering of noisy big data

摘要

著录项

相似文献

相关主题

期刊订阅