首页> 美国卫生研究院文献>Frontiers in Genetics >NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods
【2h】

NormExpression: An R Package to Normalize Gene Expression Data Using Evaluated Methods

机译:NormExpression:R包,用于使用评估方法对基因表达数据进行标准化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data normalization is a crucial step in the gene expression analysis as it ensures the validity of its downstream analyses. Although many metrics have been designed to evaluate the existing normalization methods, different metrics or different datasets by the same metric yield inconsistent results, particularly for the single-cell RNA sequencing (scRNA-seq) data. The worst situations could be that one method evaluated as the best by one metric is evaluated as the poorest by another metric, or one method evaluated as the best using one dataset is evaluated as the poorest using another dataset. Here raises an open question: principles need to be established to guide the evaluation of normalization methods. In this study, we propose a principle that one normalization method evaluated as the best by one metric should also be evaluated as the best by another metric (the consistency of metrics) and one method evaluated as the best using scRNA-seq data should also be evaluated as the best using bulk RNA-seq data or microarray data (the consistency of datasets). Then, we designed a new metric named Area Under normalized CV threshold Curve (AUCVC) and applied it with another metric mSCC to evaluate 14 commonly used normalization methods using both scRNA-seq data and bulk RNA-seq data, satisfying the consistency of metrics and the consistency of datasets. Our findings paved the way to guide future studies in the normalization of gene expression data with its evaluation. The raw gene expression data, normalization methods, and evaluation metrics used in this study have been included in an R package named NormExpression. NormExpression provides a framework and a fast and simple way for researchers to select the best method for the normalization of their gene expression data based on the evaluation of different methods (particularly some data-driven methods or their own methods) in the principle of the consistency of metrics and the consistency of datasets.
机译:数据标准化是基因表达分析中至关重要的一步,因为它可以确保其下游分析的有效性。尽管已设计了许多度量标准来评估现有的归一化方法,但是通过同一度量标准获得的不同度量标准或不同数据集的结果不一致,尤其是对于单细胞RNA测序(scRNA-seq)数据而言。最坏的情况可能是,一种度量标准将一种方法评估为最佳的方法被另一度量标准评估为最差,或者使用另一个数据集评估为最佳的一种方法被另一种数据集评估为最差的方法。这里提出了一个开放的问题:需要建立原则来指导规范化方法的评估。在这项研究中,我们提出了一个原则,即一个指标被一种指标评估为最佳的一种归一化方法也应被另一种指标(指标的一致性)评估为最佳,而使用scRNA-seq数据被评估为最佳的一种方法也应被评估为最佳。使用大量RNA-seq数据或微阵列数据(数据集的一致性)评估为最佳。然后,我们设计了一个新的指标,称为标准化CV阈值曲线下面积(AUCVC),并将其与另一个指标mSCC结合使用,以评估14种常用的标准化方法,同时使用scRNA-seq数据和bulk RNA-seq数据,满足了指标和数据集的一致性。我们的发现为评估基因表达数据的标准化奠定了指导未来研究的方法。这项研究中使用的原始基因表达数据,归一化方法和评估指标已包含在名为NormExpression的R包中。 NormExpression为研究人员根据一致性原则评估不同方法(尤其是某些数据驱动的方法或他们自己的方法)提供了一个框架,为研究人员选择最佳方法进行基因表达数据标准化提供了一个框架和快速简便的方法指标和数据集的一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号