首页> 美国卫生研究院文献>Journal of Animal Science >331 Efficient quality control methods for genomic and pedigree data used in routine genomic evaluation
【2h】

331 Efficient quality control methods for genomic and pedigree data used in routine genomic evaluation

机译:331用于常规基因组评估中使用的基因组和谱系数据的高效质量控制方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Quality control and consistency tests on genotypes and historical pedigree data are applied in a routine genomic evaluation and academic research. The quality control takes more time to finish as more genotypes become available, and this step is a bottleneck in a pipeline of routine evaluation. For the efficient quality control, we have developed several algorithms and a computer program to support for large-scale, biallelic, single nucleotide polymorphisms (SNPs). The program is designed to detect unsatisfactory genomic markers and individuals in terms of call rate, marker allele frequencies, duplicate samples, and Mendelian inconsistency in the large genomic data with the pedigree including millions of individuals. Duplicated genotypes can be detected using a set of markers. An SNP genotype is packed into a 2-bit representation in memory that enables bitwise operations with parallel computing to efficiently perform the quality control. The software optionally checks the inconsistency of pedigree information. We compared QCF90 with preGSf90, a preceding program, in terms of memory usage and computing time using a data set including 200,000 genotyped individuals, 50,000 SNP markers per individual, and 216,500 pedigree individuals. In total running time, QCF90 was approximately 6 times faster than PREGSF90 (307 s vs 2075 s) while the memory usage was 30 times less (2 GB vs 75 GB) using only 1 thread. The QCF90 program performed better in speed as more threads were used. A check for genomic duplications took 159 s with 16 threads when 5,000 genotypes were compared with 200,000 genotypes using 2500 SNP markers. The new tool is useful in the routine genomic evaluation and the academic research in which both the genotypes and the pedigree information are used. The QCF90 executable is available at http:/ce.ads.uga.edu with a user manual.
机译:基因型和历史血统数据的质量控制和一致性测试应用于常规基因组评估和学术研究。质量控制需要更多的时间来完成,因为更多的基因型可用,并且此步骤是常规评估管道中的瓶颈。为了高效的质量控制,我们开发了几种算法和计算机程序,以支持大规模的双曲线,单核苷酸多态性(SNP)。该程序旨在以呼叫率,标记等位基因频率,重复的样本和孟德尔与血统数据中的宗教系统数据中的孟德尔的不一致而旨在检测不令人满意的基因组标记和个体。可以使用一组标记检测重复的基因型。 SNP基因型被包装到存储器中的2位表示中,使得并行计算能够有效地执行质量控制。软件可选地检查谱系信息的不一致。我们将QCF90与PREGSF90,前面的程序进行比较,在使用包括200,000个基因分类的个体,每人50,000个SNP标记的数据集和216,500个血统个体的数据集。在总运行时间内,QCF90的速度比PREGSF90快大约6倍(307 S VS 2075 S),而Memory使用量仅使用1个线程少30倍(2 GB VS 75 GB)。随着更多线程的速度,QCF90程序更好地执行。当使用2500个SNP标记的200,000个基因型进行比较5,000种基因型时,检查基因组重复的检查需要159秒。新工具可用于常规基因组评估和使用基因型和血统信息的学术研究。 QCF90可执行文件可在http:/ce.ads.uga.edu提供用户手册。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号