首页> 美国卫生研究院文献>Frontiers in Microbiology >Cautions about the reliability of pairwise gene correlations based on expression data
【2h】

Cautions about the reliability of pairwise gene correlations based on expression data

机译:基于表达数据的成对基因相关性可靠性的注意事项

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background: Rapid growth in the availability of genome-wide transcript abundance levels through gene expression microarrays and RNAseq promises to provide deep biological insights into the complex, genome-wide transcriptional behavior of single-celled organisms. However, this promise has not yet been fully realized.Results: We find that computation of pairwise gene associations (correlation; mutual information) across a set of 2782 total genome-wide expression samples from six diverse bacteria produces unexpectedly large variation in estimates of pairwise gene association—regardless of the metric used, the organism under study, or the number and source of the samples. We pinpoint the cause to sampling bias. In particular, in repositories of expression data (e.g., Gene Expression Omnibus, GEO), many individual genes show small differences in absolute gene expression levels across the set of samples. We demonstrate that these small differences are due mainly to “noise” instead of “signal” attributable to environmental or genetic perturbations. We show that downstream analysis using gene expression levels of genes with small differences yields biased estimates of pairwise association.Conclusions: We propose flagging genes with small differences in absolute, RMA-normalized, expression levels (e.g., standard deviation less than 0.5), as potentially yielding biased pairwise association metrics. This strategy has the potential to substantially improve the confidence in genome-wide conclusions about transcriptional behavior in bacterial organisms. Further work is needed to further refine strategies to identify genes with small difference in expression levels prior to computing gene-gene association metrics.
机译:背景:通过基因表达微阵列和RNAseq,全基因组范围的转录丰度水平的快速增长有望为深入了解单细胞生物复杂的,全基因组的转录行为提供生物学依据。但是,这一承诺尚未完全实现。结果:我们发现,从六种不同细菌中分离出的2782个全基因组表达样本中,成对基因关联的计算(相关性,互信息)配对基因关联的估计中出乎意料的大变化-不管使用的度量标准,所研究的有机体还是样本的数量和来源。我们查明了采样偏差的原因。特别地,在表达数据的存储库(例如,Gene Expression Omnibus,GEO)中,许多个体基因在整个样品集中在绝对基因表达水平上显示出小的差异。我们证明这些小的差异主要是由于“噪声”而不是环境或遗传扰动引起的“信号”。我们显示,使用差异较小的基因的基因表达水平进行的下游分析会产生成对关联的有偏估计。结论:我们建议标记在绝对,RMA标准化表达水平(例如,标准偏差小于0.5),因为可能会产生有偏向的成对关联度量。这种策略有可能大大提高对细菌生物转录行为的全基因组结论的信心。需要进一步的工作来进一步完善策略,以在计算基因与基因的关联指标之前鉴定出表达水平差异较小的基因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号