...
首页> 外文期刊>The Annals of applied statistics >A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA
【24h】

A UNIFIED STATISTICAL FRAMEWORK FOR SINGLE CELL AND BULK RNA SEQUENCING DATA

机译:单细胞和批量RNA测序数据的统一统计框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the "dropout" events. A "dropout" happens when the RNA for a gene fails to be amplified prior to sequencing, producing a "false" zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes' approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.
机译:技术的最新进展使得单个细胞的RNA水平的测量能够测量。与传统的组织级散装RNA-SEQ数据相比,单细胞测序产生了对不同细胞类型的基因表达谱的有价值的见解,这对于了解许多复杂的人类疾病可能是至关重要的。然而,由于高水平的技术噪音,特别是“辍学”事件,开发用于此类数据的定量工具仍然具有挑战性。当在测序之前,当基因的RNA未被扩增时,会发生“丢失”,在观察到的数据中产生“假”零。在本文中,我们为单个细胞和批量RNA-SEQ数据提出了统一的RNA测序模型(URSM),其作为分层模型制定。 URSM借助数据源的强度,并仔细模拟单个细胞数据中的辍学,导致细胞类型特定基因表达谱的更准确估计。此外,URSM自然地在需要抵消下游分析的单个小区数据中的丢弃条目中的推断,以及批量样本中不同细胞类型的混合比例。我们采用经验贝叶斯的方法,其中使用EM算法估计参数,并通过GIBBS采样获得近似推断。仿真结果表明,URSM在校正单个小区数据中的丢失方面的现有方法以及解构的批量样本中。我们还证明了对胎儿大脑的基因表达数据的应用,我们的模型成功赋予辍学基因并揭示了细胞类型特异性表达模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号