首页> 外文期刊>Bioinformatics >Fast randomization of large genomic datasets while preserving alteration counts
【24h】

Fast randomization of large genomic datasets while preserving alteration counts

机译:大型基因组数据集的快速随机化,同时保留变异数

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks.
机译:动机:最近在研究癌症基因组数据集中的组合模式已成为鉴定新型癌症驱动程序网络的工具。已经设计了一些方法来量化例如一组基因以“互斥”方式突变的趋势。通常通过在适当的空模型下计算P值来评估建议指标的重要性。为此,在保留患者和基因突变率的无效模型下,使用蒙特卡洛方法(转换算法)对模拟数据集进行采样。在这种方法中,基因组数据集表示为两部分网络,对其应用了马尔可夫链更新(切换步骤)。这些步骤修改了网络拓扑,必须执行最少的步骤才能在null模型下独立绘制模拟数据集。先前已根据经验将这个数字推算为变量总数的线性函数,从而使此过程的计算量很大。结果:我们提出了一种新颖的近似下限,用于分析得出的切换步骤数。此外,我们还开发了R软件包BiRewire,其中包括新的高效交换算法实现。我们通过将BiRewire应用于大型真实癌症基因组数据集来说明其性能。我们报告,相对于现有的实现/范围和等效的P值计算,时间要求大大减少了。因此,我们建议使用BiRewire来研究基因组数据集以及可建模为二分网络的其他数据中的统计特性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号