...
首页> 外文期刊>Statistics and computing >Bayesian nonparametric clustering for large data sets
【24h】

Bayesian nonparametric clustering for large data sets

机译:大数据集的贝叶斯非参数聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We propose two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene-gene interaction. Both approaches define model-based clustering with nonparametric Bayesian priors and include an implementation that remains feasible for big data. The first method is based on a predictive recursion which requires a single cycle (or few cycles) of simple deterministic calculations for each observation under study. The second scheme is an exact method that divides the data into smaller subsamples and involves local partitions that can be determined in parallel. In a second step, the method requires only the sufficient statistics of each of these local clusters to derive global clusters. Under simulated and benchmark data sets the proposed methods compare favorably with other clustering algorithms, including k-means, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm. We apply the proposed approaches to cluster a large data set of gene-gene interactions extracted from the online search tool Zodiac.
机译:我们提出了两种非参数贝叶斯方法来聚类大数据,并通过基因-基因相互作用的模式将它们应用于聚类基因。两种方法都使用非参数贝叶斯先验定义了基于模型的聚类,并包括了对大数据仍然可行的实现。第一种方法基于预测递归,对于每个研究对象,该递归都需要单个周期(或几个周期)的简单确定性计算。第二种方案是一种精确的方法,它将数据分为较小的子样本,并且涉及可以并行确定的局部分区。在第二步骤中,该方法仅需要对这些本地群集中的每个本地群集进行足够的统计即可得出全局群集。在模拟和基准数据集下,所提出的方法与其他聚类算法(包括k均值,DP均值,DBSCAN,SUGS,流变分贝叶斯算法和EM算法)相比具有优势。我们应用提出的方法来聚类从在线搜索工具Zodiac提取的大量基因-基因相互作用数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号