首页> 美国卫生研究院文献>Nucleic Acids Research >svaseq: removing batch effects and other unwanted noise from sequencing data
【2h】

svaseq: removing batch effects and other unwanted noise from sequencing data

机译:svaseq:从测序数据中消除批处理效应和其他有害噪声

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: .
机译:现已知道,不需要的噪声和未建模的伪影(例如批效应)会大大降低基因组实验中统计推断的准确性。在执行高通量基因组分析时,必须对这些噪声源进行建模和消除,以准确地测量生物变异性并获得正确的统计推断。我们引入了替代变量分析(sva)来估计这些伪影,方法是(i)识别仅受伪影影响的部分基因组数据,以及(ii)用数据矩阵子集的主成分或奇异矢量估计伪影。所得的假象估计值可以在后续分析中用作调整因子以校正分析。在这里,我描述了一种sva方法的版本,该方法专门为基于适当数据转换的测序实验中的计数数据或FPKM创建。我还描述了监督sva(ssva)的添加,以使用对照探针来识别仅受伪影影响的基因组数据部分。我在这些版本的sva和其他方法上进行了比较,这些方法用于对模拟数据,基于实数的数据和基于FPKM的数据进行批处理效果评估。这些更新可通过sva Bioconductor软件包获得,我已经使用以下方法提供了完全可重复的分析:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号