...
首页> 外文期刊>Nucleic acids research >Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization
【24h】

Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization

机译:使用矩阵分解从多个相互关联的生物学场景的数据中学习通用模式和特定模式

获取原文
           

摘要

High-throughput biological technologies (e.g. ChIP-seq, RNA-seq and single-cell RNA-seq) rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios (e.g. cells, tissues and conditions). Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patterns of difference or require matched dimensions of the data. Here, we propose a flexible framework CSMF to combine them into one paradigm to simultaneously reveal Common and Specific patterns via Matrix Factorization from data generated under interrelated biological scenarios. We demonstrate the effectiveness of CSMF with four representative applications including pairwise ChIP-seq data describing the chromatin modification map between K562 and Huvec cell lines; pairwise RNA-seq data representing the expression profiles of two different cancers; RNA-seq data of three breast cancer subtypes; and single-cell RNA-seq data of human embryonic stem cell differentiation at six time points. Extensive analysis yields novel insights into hidden combinatorial patterns in these multi-modal data. Results demonstrate that CSMF is a powerful tool to uncover common and specific patterns with significant biological implications from data of interrelated biological scenarios.
机译:高通量生物学技术(例如ChIP-seq,RNA-seq和单细胞RNA-seq)在各种相互关联的生物学场景(例如细胞,组织和条件)中迅速加速了全基因组组学数据的积累。集成和差异分析是探索和分析此类数据的两个常见范例。但是,当前的集成方法通常会忽略差异部分,典型的差异分析方法要么无法识别差异的组合模式,要么需要匹配的数据维度。在这里,我们提出了一个灵活的框架CSMF,将它们组合成一个范式,以通过矩阵分解从相关生物场景下生成的数据中同时揭示常见模式和特定模式。我们用四个有代表性的应用展示了CSMF的有效性,其中包括成对的ChIP-seq数据,描述了K562细胞与Huvec细胞系之间的染色质修饰图。成对的RNA-seq数据代表两种不同癌症的表达谱;三种乳腺癌亚型的RNA-seq数据;和人类胚胎干细胞在六个时间点分化的单细胞RNA-seq数据。广泛的分析产生了对这些多模式数据中隐藏的组合模式的新颖见解。结果表明,CSMF是一个强大的工具,可以从相互关联的生物场景数据中发现具有重要生物学意义的常见和特定模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号