Bayesian nonparametric clustering for large data sets

Zuanetti Daiane Aparecida; Mueller Peter; Zhu Yitan; Yang Shengjie; Ji Yuan

首页> 外文期刊>Statistics and computing >Bayesian nonparametric clustering for large data sets

【24h】

Bayesian nonparametric clustering for large data sets

机译：大数据集的贝叶斯非参数聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene-gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene-gene interactions extracted from the online search tool Zodiac.

机译：我们提出了两种非参数贝叶斯方法来聚类大数据，并通过基因-基因相互作用的模式将它们应用于聚类基因。两种方法都使用非参数贝叶斯先验定义了基于模型的聚类，并包括了对大数据仍然可行的实现。第一种方法基于预测递归，对于每个研究对象，该递归都需要单个周期（或几个周期）的简单确定性计算。第二种方案是一种精确的方法，它将数据分为较小的子样本，并且涉及可以并行确定的局部分区。在第二步骤中，该方法仅需要对这些本地群集中的每个本地群集进行足够的统计即可得出全局群集。在模拟和基准数据集下，所提出的方法与其他聚类算法（包括k均值，DP均值，DBSCAN，SUGS，流变分贝叶斯算法和EM算法）相比具有优势。我们应用提出的方法来聚类从在线搜索工具Zodiac提取的大量基因-基因相互作用数据集。

著录项

来源
《Statistics and computing》 |2019年第2期|203-215|共13页
作者
Zuanetti Daiane Aparecida; Mueller Peter; Zhu Yitan; Yang Shengjie; Ji Yuan;
展开▼
作者单位

Univ Fed Sao Carlos, Dept Estat, Sao Carlos, SP, Brazil;

UT Austin, Dept Math, Austin, TX USA;

NorthShore Univ HealthSyst, Evanston, IL USA;

NorthShore Univ HealthSyst, Evanston, IL USA;

NorthShore Univ HealthSyst, Evanston, IL USA|Univ Chicago, Evanston, IL USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data clustering; Gene-gene interactions; Predictive recursion; Nonparametric Bayes; TCGA;

机译：大数据聚类;基因-基因相互作用;预测递归;非参数贝叶斯;TCGA;

相似文献

外文文献
中文文献
专利

1. A Differentially Private Big Data Nonparametric Bayesian Clustering Algorithm in Smart Grid [J] . Guan Zhitao, Lv Zefang, Sun Xianwen, Network Science and Engineering, IEEE Transactions on . 2020,第4期

机译：智能电网中差别私有的大数据非参数贝叶斯聚类算法
2. Spherical data clustering and feature selection through nonparametric Bayesian mixture models with von Mises distributions [J] . Wentao Fan, Nizar Bouguila Engineering Applications of Artificial Intelligence . 2020,第Sepa期

机译：通过非参数贝叶斯混合模型与Von Mises分布的球形数据聚类和特征选择
3. A spatio-temporal nonparametric Bayesian variable selection model of fMRI data for clustering correlated time courses [J] . Linlin Zhang, Michele Guindani, Francesco Versace, NeuroImage . 2014,第Null期

机译：fMRI数据的时空非参数贝叶斯变量选择模型用于相关时间过程的聚类
4. Determinantal Clustering Process - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering [C] . Amar Shah, Zoubin Ghahramani Conference on uncertainty in artificial intelligence . 2013

机译：行列式聚类过程-基于核的半监督聚类的非参数贝叶斯方法
5. Bayesian nonparametric models for ranked set sampling. [D] . Gemayal, Nader M. 2010

机译：用于排序集抽样的贝叶斯非参数模型。
6. Integrative biclustering of heterogeneous datasets using a Bayesian nonparametric model with application to chemogenomics [O] . Dazhuo Li, Eric C Rouchka 2011

机译：使用贝叶斯非参数模型对异构数据集进行整合二聚类化并应用于化学基因组学
7. Nonparametric Hierarchical Bayesian Models for Positive Data Clustering Based on Inverted Dirichlet-Based Distributions [O] . Wentao Fan, Nizar Bouguila 2019

机译：基于反相的Dirichlet的分布的非参数分层贝叶斯模型

Bayesian nonparametric clustering for large data sets

摘要

著录项

相似文献

相关主题

期刊订阅