首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >Tumor Copy Number Data Deconvolution Integrating Bulk and Single-cell Sequencing Data
【24h】

Tumor Copy Number Data Deconvolution Integrating Bulk and Single-cell Sequencing Data

机译:整合大批量和单细胞测序数据的肿瘤拷贝数数据反卷积

获取原文

摘要

Resolving tumor heterogeneity is a crucial step in understanding cancer development and evolution but it is hampered by limits of all available data sources. Bulk sequencing has become the most common technology to assess the tumor heterogeneity but it has the limitation of mixing many genetically distinct cells in each sample which must then be computationally deconvolved. This genomic deconvolution generally has low resolution and high error rates in reconstructing clonal population structure. Recent technological developments in single-cell sequencing (SCS) provide the potential for providing high resolution, whole-genome reconstructions of clonal structure. However, the limitations of SCS - such as high noise, difficulty in scaling to large populations, various challenging technical artifacts, and the large data sets it produces - have so far made it impractical for applying to study cohorts of sufficient size to identify statistically robust features of tumor evolution. To address these problems, we have developed strategies to combine limited amounts of bulk and single-cell data to gain some advantages of single-cell resolution with much lower cost. We specifically focus on the problem of deconvolving copy number data from bulk samples assisted by information from small numbers of SCS sequences. We developed a mixed membership model for clonal deconvolution via Non-Negative Matrix Factorization (NMF) balancing deconvolution quality of the bulk data with similarity to single-cell samples and an associated efficient coordinate descent algorithm. We improve on that algorithm by integrating deconvolution with clonal phylogeny inference, using an integer linear programming (ILP) model to add a minimum evolution phylogenetic cost to the problem objective so as to bias deconvolution to favor inferred clones that are plausibly related to observed SCS data. We demonstrate the effectiveness of these methods on semi-simulated data of known ground truth, showing significantly enhanced deconvolution accuracy relative bulk data alone.
机译:解决肿瘤异质性是理解癌症发展和进化的关键步骤,但受到所有可用数据源的限制。批量测序已成为评估肿瘤异质性的最常用技术,但是它具有将每个样本中许多遗传上不同的细胞混合在一起的局限性,然后必须对其进行计算解卷积。在重建克隆种群结构中,这种基因组反卷积通常具有较低的分辨率和较高的错误率。单细胞测序(SCS)的最新技术发展为提供高分辨率,全基因组克隆结构重建提供了潜力。但是,迄今为止,SCS的局限性(例如高噪声,难以按比例缩放到大量人群,各种具有挑战性的技术工件以及所产生的大量数据集)使得应用研究规模足够大的人群来识别统计上的稳健性是不切实际的。肿瘤演变的特征。为了解决这些问题,我们开发了将有限数量的大容量和单细胞数据相结合的策略,从而以更低的成本获得了单细胞分辨率的一些优势。我们特别关注在大量SCS序列信息的辅助下,对大样本中的拷贝数数据进行反卷积的问题。我们通过非负矩阵分解(NMF)平衡了大数据的反卷积质量(与单细胞样本相似)和相关联的有效坐标下降算法,开发了用于克隆反卷积的混合成员模型。我们通过将反卷积与克隆系统发育推断相集成来改进该算法,使用整数线性规划(ILP)模型为问题目标添加最小进化系统发生成本,从而偏向反卷积,以偏爱与观察到的SCS数据合理相关的推断克隆。我们证明了这些方法对已知地面真实情况的半模拟数据的有效性,表明相对于单独的大量数据而言,反卷积精度显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号