...
首页> 外文期刊>Genome Biology >Errors in RNA-Seq quantification affect genes of relevance to human disease
【24h】

Errors in RNA-Seq quantification affect genes of relevance to human disease

机译:RNA-Seq定量错误影响与人类疾病相关的基因

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background: RNA-Seq has emerged as the standard for measuring gene expression and is an important technique often used in studies of human disease. Gene expression quantification involves comparison of the sequenced reads to a known genomic or transcriptomic reference. The accuracy of that quantification relies on there being enough unique information in the reads to enable bioinformatics tools to accurately assign the reads to the correct gene. Results: We apply 12 common methods to estimate gene expression from RNA-Seq data and show that there are hundreds of genes whose expression is underestimated by one or more of those methods. Many of these genes have been implicated in human disease, and we describe their roles. We go on to propose a two-stage analysis of RNA-Seq data in which multi-mapped or ambiguous reads can instead be uniquely assigned to groups of genes. We apply this method to a recently published mouse cancer study, and demonstrate that we can extract relevant biological signal from data that would otherwise have been discarded. Conclusions: For hundreds of genes in the human genome, RNA-Seq is unable to measure expression accurately. These genes are enriched for gene families, and many of them have been implicated in human disease. We show that it is possible to use data that may otherwise have been discarded to measure group-level expression, and that such data contains biologically relevant information.
机译:背景:RNA-Seq已成为测量基因表达的标准,并且是人类疾病研究中经常使用的一项重要技术。基因表达定量涉及将测序的读数与已知的基因组或转录组参考进行比较。定量的准确性取决于读数中是否有足够的独特信息,以使生物信息学工具能够将读数准确分配给正确的基因。结果:我们使用12种常用方法从RNA-Seq数据估计基因表达,并显示有数百种基因的表达被其中一种或多种方法低估。这些基因中有许多与人类疾病有关,我们描述了它们的作用。我们继续提出对RNA-Seq数据的两阶段分析,在该分析中,可以将多映射或模棱两可的读数唯一地分配给基因组。我们将这种方法应用于最近发表的小鼠癌症研究中,并证明我们可以从否则将被丢弃的数据中提取相关的生物信号。结论:对于人类基因组中的数百个基因,RNA-Seq无法准确测量表达。这些基因丰富了基因家族,其中许多与人类疾病有关。我们表明可以使用可能被丢弃的数据来测量组水平表达,并且此类数据包含生物学相关信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号