首页> 美国卫生研究院文献>PLoS Clinical Trials >Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed
【2h】

Normalization of High Dimensional Genomics Data Where the Distribution of the Altered Variables Is Skewed

机译:高维基因组学数据的归一化变化变量的分布偏斜

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
机译:现在,通常使用不同的阵列或基于测序的技术对基因表达或蛋白质结合模式进行全基因组分析,以比较不同人群,如治疗组和参考组。通常有必要对获得的数据进行归一化以消除在进行实验工作过程中引入的技术差异,但是在真正改变的变量的分布存在偏差的情况下(即当较大的变量较大时),标准归一化技术无法消除技术偏差。变量的一小部分受到治疗的正面或负面影响。然而,一些实验可能会产生这种偏斜的分布,包括用于染色质研究的ChIP芯片实验,用于细胞凋亡的基因表达实验以及正常和肿瘤组织中拷贝数变异的SNP研究。一项使用加标数组数据的初步研究表明,随着变量变化的比例和偏度的增加,识别变化的变量并生成倍数变化的无偏估计的实验能力会降低。我们提出以下工作流程来分析变量区域变化较大的高维实验:(1)使用一种标准归一化技术对原始数据进行预处理。 (2)研究更改后的变量的分布是否偏斜。 (3)如果认为分布不偏斜,则不需要其他归一化。否则,请使用新颖的HMM辅助归一化过程对数据进行重新归一化。 (4)进行下游分析。在这里,使用ChIP芯片数据和模拟数据来评估工作流程的性能。已经发现可以通过使用新颖的DSE-test(偏斜检测的检测)来检测偏斜分布。此外,将HMM辅助归一化应用于那些实际更改的变量的分布偏斜的实验,与使用标准和不变归一化方法所能获得的结果相比,其灵敏度和偏倚要高得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号