首页> 外文OA文献 >Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score
【2h】

Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score

机译:基因组暗物质:由基因组可映射性得分说明的短读映射的可靠性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself. Results: We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions.
机译:动机:基因组重测序和短读映射是基因组学的两个主要工具,可用于许多重要应用。映射中的最新技术使用质量值和映射质量得分来评估映射的可靠性。但是,这些属性被分配给单个读取,并不能直接测量整个基因组中有问题的重复。在这里,我们提出了基因组可映射性评分(GMS),作为重新测序基因组的复杂性的一种新颖方法。 GMS是加权的概率,可以将任何读数明确地映射到给定位置,从而测量基因组本身的整体组成。结果:我们开发了Genome Mappability Analyzer,可以计算基因组中每个位置的GMS。它利用云计算的并行性分析大型基因组,使我们能够识别5-14%的人类,小鼠,果蝇和酵母基因组,这些基因组很难通过短读进行分析。我们在GMS的背景下检查了广泛使用的BWA / SAMtools多态性发现管道的准确性,发现发现错误主要是由假阴性引起的,尤其是在GMS较差的地区。这些错误是映射过程的基础,无法通过增加覆盖范围来克服。因此,在每个重测序项目中都应考虑使用GMS,以查明基因组的“暗物质”,包括这些区域中已知的临床相关变异。

著录项

  • 作者

    Lee H.; Schatz M. C.;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号