首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data
【24h】

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq Data

机译:RNA-SEQ数据中的关节归一化和差异基因表达检测的统一模型

获取原文
获取原文并翻译 | 示例

摘要

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization is an essential step in differential expression (DE) analysis. The normalization step of existing DE detection algorithms is usually ad hoc and performed only once prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented Lagrangian method to solve it. Simulation and real data studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.
机译:RNA测序(RNA-SEQ)正越来越受欢迎,用于量化基因表达水平。由于RNA-SEQ测量是相对性的,所以样品归一化是差异表达(DE)分析的基本步骤。现有DE检测算法的归一化步骤通常是临时,仅在DE检测之前仅进行一次,这可能是次优,因为理想的标准化应该基于非DE基因,因此与DE检测相结合。我们提出了一个统一的统计模型,用于联合归一化和RNA-SEQ数据的检测。特定于样本的归一化因子被建模为基因 - 明智的线性模型中的未知参数,并用回归系数联合估计。通过在回归系数上施加稀疏诱导的L1惩罚(或混合L1 / L2惩罚),我们将问题作为惩罚的最小二乘性回归问题,并应用增强拉格朗日方法来解决它。模拟和实际数据研究表明,在检测功率和假阳性率方面,所提出的模型和算法表现优于或与现有方法更好。性能增益随着越来越大的样本量或更高的信噪比而增加,并且当大部分基因以不对称的方式差异表达时更为显着。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号