首页> 外文期刊>Nucleic Acids Research >svaseq: removing batch effects and other unwanted noise from sequencing data
【24h】

svaseq: removing batch effects and other unwanted noise from sequencing data

机译:svaseq:从测序数据中消除批处理效应和其他有害噪声

获取原文
获取原文并翻译 | 示例
           

摘要

It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq.
机译:现已知道,不需要的噪声和未建模的伪影(例如批效应)会大大降低基因组实验中统计推断的准确性。在执行高通量基因组分析时,必须对这些噪声源进行建模和消除,以准确地测量生物变异性并获得正确的统计推断。我们引入了替代变量分析(sva)来估计这些伪影,方法是(i)识别仅受伪影影响的部分基因组数据,以及(ii)用数据矩阵的子集的主成分或奇异矢量估计伪影。所得的伪影估计值可在后续分析中用作调整因子以校正分析。在这里,我描述了一种sva方法的版本,该方法专门为基于适当数据转换的测序实验中的计数数据或FPKM创建。我还描述了监督sva(ssva)的添加,以使用对照探针来识别仅受伪影影响的基因组数据部分。我对这些版本的sva和其他方法进行了比较,这些方法用于对模拟数据,基于实数的数据和基于FPKM的数据进行批处理效果评估。这些更新可通过sva Bioconductor软件包获得,我已经使用以下方法提供了完全可重复的分析:https://github.com/jtleek/svaseq。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号