首页> 美国卫生研究院文献>Bioinformatics >Fast randomization of large genomic datasets while preserving alteration counts
【2h】

Fast randomization of large genomic datasets while preserving alteration counts

机译:快速随机化大型基因组数据集同时保留变异数

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

>Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a ‘mutually exclusive’ manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive.>Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks.>Availability and implementation: BiRewire is available on BioConductor at >Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:最近,研究癌症基因组数据集中的组合模式成为一种识别新型癌症驱动程序网络的工具。例如,已经设计出了一些方法来量化一组基因以“互斥”方式突变的趋势。通常通过在适当的空模型下计算P值来评估建议指标的重要性。为此,在保留患者和基因突变率的无效模型下,使用蒙特卡罗方法(转换算法)对模拟数据集进行采样。在这种方法中,基因组数据集表示为两部分网络,对其应用了马尔可夫链更新(切换步骤)。这些步骤修改了网络拓扑,必须执行最少的步骤才能在null模型下独立绘制模拟数据集。先前已根据经验推断此数字为变量总数的线性函数,从而使该过程的计算量很大。>结果:我们为切换步骤的数量提供了一种新颖的近似下限,得出分析地。此外,我们还开发了R软件包BiRewire,其中包括交换算法的新有效实现。我们通过将BiRewire应用于大型真实癌症基因组数据集来说明其性能。我们报告,相对于现有的实现/范围和等效的P值计算,时间要求大大减少了。因此,我们建议使用BiRewire来研究基因组数据集以及可建模为二分网络的其他数据中的统计特性。>可用性和实现:BiRewire在BioConductor上的>联系方式: >补充信息:可在线访问生物信息学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号