首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data
【24h】

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data

机译:RNA-Seq数据中联合标准化和差异基因表达检测的统一模型

获取原文
获取原文并翻译 | 示例

摘要

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization is an essential step in differential expression (DE) analysis. The normalization step of existing DE detection algorithms is usually ad hoc and performed only once prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented Lagrangian method to solve it. Simulation and real data studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.
机译:RNA测序(RNA-seq)在量化基因表达水平方面正变得越来越流行。由于RNA-seq测量本质上是相对的,因此样品间标准化是差异表达(DE)分析中必不可少的步骤。现有DE检测算法的归一化步骤通常是临时的,并且仅在DE检测之前执行一次,这可能是次优的,因为理想的归一化应该仅基于非DE基因,并因此与DE检测结合。我们提出了一个统一的统计模型,用于联合归一化和DE检测RNA-seq数据。特定于样本的归一化因子在基因线性模型中被建模为未知参数,并与回归系数共同估算。通过在回归系数上施加稀疏性诱导L1罚分(或在多种处理条件下混合使用L1 / L2罚分),可以将该问题公式化为惩罚最小二乘回归问题,并应用增强拉格朗日方法进行求解。仿真和实际数据研究表明,在检测能力和假阳性率方面,所提出的模型和算法的性能优于或与现有方法相当。随着样本量的增加或信噪比的提高,性能增益会提高,而当大量基因以非对称方式差异表达时,性能增益将更为显着。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号