首页> 美国卫生研究院文献>Frontiers in Genetics >A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation
【2h】

A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation

机译:基于机器学习的新框架用于RNA-Seq读比对和基因表达估计中的不确定度分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

One of the main benefits of using modern RNA-Sequencing (RNA-Seq) technology is the more accurate gene expression estimations compared with previous generations of expression data, such as the microarray. However, numerous issues can result in the possibility that an RNA-Seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple-mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to effectively test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses. Our investigation into 95 RNA-Seq datasets from seven plant and animal species (totaling 1,951 GB) indicates an average of roughly 22% of all reads are MMRs. Here we present a machine learning-based tool called >GeneQC (>Gene expression >Quality >Control), which can accurately estimate the reliability of each gene's expression level derived from an RNA-Seq dataset. The underlying algorithm is designed based on extracted genomic and transcriptomic features, which are then combined using elastic-net regularization and mixture model fitting to provide a clearer picture of mapping uncertainty for each gene. GeneQC allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that is of sufficient quality. This tool also enables researchers to investigate continued re-alignment methods to determine more accurate gene expression estimates for those with low reliability. Application of GeneQC reveals high level of mapping uncertainty in plant samples and limited, severe mapping uncertainty in animal samples. GeneQC is freely available at .
机译:使用现代RNA测序(RNA-Seq)技术的主要好处之一是,与前几代表达数据(例如微阵列)相比,基因表达估计更加准确。但是,许多问题都可能导致RNA-Seq读数可以以相同的比对得分映射到参考基因组上的多个位置,这种情况发生在植物,动物和元基因组样品中。这种读取被称为多重映射读取(MMR)。这些MMR的影响反映在基因表达估计和所有下游分析中,包括差异基因表达,功能富集等。当前的分析管道缺乏有效测试基因表达估计可靠性的工具,因此无法确保基因表达估计的有效性。所有下游分析。我们对来自七个动植物物种的95个RNA-Seq数据集(总计1,951 GB)的调查表明,平均所有读物中约22%是MMR。在这里,我们介绍了一种基于机器学习的工具,称为> GeneQC (> Gene 表达式> Q uality > C ontrol),该工具可以从RNA-Seq数据集准确估算出每个基因表达水平的可靠性。基于提取的基因组和转录组特征设计基础算法,然后使用弹性网正则化和混合模型拟合将其组合在一起,以提供更清晰的每个基因定位不确定性的图片。 GeneQC使研究人员能够确定可靠的表达估计,并对质量足够的基因表达进行进一步分析。该工具还使研究人员能够研究持续的重排方法,从而为那些可靠性较低的人确定更准确的基因表达估计值。 GeneQC的应用揭示了植物样品中的高水平绘图不确定性和动物样品中的有限,严重的绘图不确定性。可通过以下网址免费获得GeneQC。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号