首页> 外文期刊>Bioinformatics >Compression and fast retrieval of SNP data
【24h】

Compression and fast retrieval of SNP data

机译:压缩和快速检索SNP数据

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The increasing interest in rare genetic variants and epistatic genetic effects on complex phenotypic traits is currently pushing genome-wide association study design towards datasets of increasing size, both in the number of studied subjects and in the number of genotyped single nucleotide polymorphisms (SNPs). This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data. Results: We present a novel algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies. Our algorithm is based on two main ideas: (i) compress linkage disequilibrium blocks in terms of differences with a reference SNP and (ii) compress reference SNPs exploiting information on their call rate and minor allele frequency. Tested on two SNP datasets and compared with several state-of-the-art software tools, our compression algorithm is shown to be competitive in terms of compression rate and to outperform all tools in terms of time to load compressed data
机译:动机:人们对稀有遗传变异和对复杂表型性状的上位遗传效应的兴趣日益浓厚,目前正将全基因组关联研究设计推向越来越大的数据集,无论是研究对象的数量还是基因型单核苷酸多态性(SNP)的数量)。反过来,这导致迫切需要压缩和快速检索SNP数据的新方法。结果:我们提出了一种新颖的算法和文件格式,用于压缩和检索SNP数据,专门为大规模关联研究设计。我们的算法基于两个主要思想:(i)根据参考SNP的差异压缩连锁不平衡模块,以及(ii)利用有关其呼叫率和次要等位基因频率的信息压缩参考SNP。在两个SNP数据集上进行了测试,并与几种最新的软件工具进行了比较,我们的压缩算法在压缩率方面具有竞争优势,并且在加载压缩数据的时间方面胜过所有工具

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号