首页> 美国卫生研究院文献>Nucleic Acids Research >Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework
【2h】

Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework

机译:强大而快速的算法有助于在集成框架中进行大规模全基因组测序下游分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.
机译:全基因组测序(WGS)是一种有前途的策略,可以揭示与人类疾病和性状有关的变异或基因。但是,缺乏用于全面下游分析的强大平台。在本研究中,我们首先提出了三种新颖的算法,即序列缺口填充基因特征注释,位块编码的基因型和对文本行的分段快速访问,以解决三个基本问题。然后,这三种算法形成了强大的并行计算框架KGGSeq的基础结构,用于集成下游分析功能以获取整个基因组测序数据。 KGGSeq配备了一套全面的分析功能,用于质量控制,过滤,注释,病原体预测和统计测试。在使用1000个基因组计划的全基因组测序数据进行的测试中,KGGSeq注释了比其他广泛使用的工具(例如ANNOVAR和SNPEff)可靠的数千个非同义词变体。在具有10个CPU的小型服务器上,仅花了半个小时就可以访问2504个受试者的约6,000万个基因型,而一种流行的替代工具大约需要一天的时间。 KGGSeq的位块基因型格式使用1.5%或更少的空间来灵活地表示具有多个等位基因的分阶段或非分阶段基因型,并且计算基因型相关性的速度提高了1000倍以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号