Fast randomization of large genomic datasets while preserving alteration counts

Gobbi Andrea; Iorio Francesco; Dawson Kevin J.; Wedge David C.; Tamborero David; Alexandrov Ludmil B.; Lopez-Bigas Nuria; Garnett Mathew J.; Jurman Giuseppe; Saez-Rodriguez Julio

首页> 外文期刊>Bioinformatics >Fast randomization of large genomic datasets while preserving alteration counts

【24h】

Fast randomization of large genomic datasets while preserving alteration counts

机译：大型基因组数据集的快速随机化，同时保留变异数

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks.

机译：动机：最近在研究癌症基因组数据集中的组合模式已成为鉴定新型癌症驱动程序网络的工具。已经设计了一些方法来量化例如一组基因以“互斥”方式突变的趋势。通常通过在适当的空模型下计算P值来评估建议指标的重要性。为此，在保留患者和基因突变率的无效模型下，使用蒙特卡洛方法（转换算法）对模拟数据集进行采样。在这种方法中，基因组数据集表示为两部分网络，对其应用了马尔可夫链更新（切换步骤）。这些步骤修改了网络拓扑，必须执行最少的步骤才能在null模型下独立绘制模拟数据集。先前已根据经验将这个数字推算为变量总数的线性函数，从而使此过程的计算量很大。结果：我们提出了一种新颖的近似下限，用于分析得出的切换步骤数。此外，我们还开发了R软件包BiRewire，其中包括新的高效交换算法实现。我们通过将BiRewire应用于大型真实癌症基因组数据集来说明其性能。我们报告，相对于现有的实现/范围和等效的P值计算，时间要求大大减少了。因此，我们建议使用BiRewire来研究基因组数据集以及可建模为二分网络的其他数据中的统计特性。

著录项

来源
《Bioinformatics》 |2014年第17期|共7页
作者
Gobbi Andrea; Iorio Francesco; Dawson Kevin J.; Wedge David C.; Tamborero David; Alexandrov Ludmil B.; Lopez-Bigas Nuria; Garnett Mathew J.; Jurman Giuseppe; Saez-Rodriguez Julio;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物工程学（生物技术）;
关键词

相似文献

外文文献
中文文献
专利

1. Fast randomization of large genomic datasets while preserving alteration counts [J] . Gobbi Andrea, Iorio Francesco, Dawson Kevin J., Bioinformatics . 2014,第17期

机译：大型基因组数据集的快速随机化，同时保留变异数
2. Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer [J] . Pushpinder Kaur, Tania B. Porras, Alexander Ring, Scientific reports. . 2019,第1期

机译：比较TCGA和GENIE基因组数据集以检测乳腺癌中可临床操作的改变
3. Comparison of TCGA and GENIE genomic datasets for the detection of clinically actionable alterations in breast cancer [J] . Pushpinder Kaur, Tania B. Porras, Alexander Ring, Scientific reports. . 2019,第1期

机译：比较TCGA和GENIE基因组数据集以检测乳腺癌中可临床操作的改变
4. A Fourier-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets [C] . Mohammed Aledhari, Marianne Di Pierro, Fahad Saeed 2018 IEEE International Congress on Big Data . 2018

机译：基于傅立叶的数据最小化算法，可快速安全地传输大基因组数据集
5. Interactive fast random access, retrieval, and navigation of large datasets [D] . Fan, Zihong 2011

机译：大型数据集的交互式快速随机访问，检索和导航
6. Fast randomization of large genomic datasets while preserving alteration counts [O] . Andrea Gobbi, Francesco Iorio, Kevin J. Dawson, -1

机译：快速随机化大型基因组数据集同时保留变异数
7. Privacy-preserving GWAS analysis on federated genomic datasets [O] . 2015

机译：联邦基因组数据集的保护隐私的GWAS分析

Fast randomization of large genomic datasets while preserving alteration counts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅