首页> 外文会议>Pacific Symposium on Biocomputing >Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive
【24h】

Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive

机译:从序列读取存档中提取250,000人序列运行的等位基因读数

获取原文

摘要

The Sequence Read Archive (SRA) contains over one million publicly available sequencing runs from various studies using a variety of sequencing library strategies. These data inherently contain information about underlying genomic sequence variants which we exploit to extract allelic read counts on an unprecedented scale. We reprocessed over 250,000 human sequencing runs (>1000 TB data worth of raw sequence data) into a single unified dataset of allelic read counts for nearly 300,000 variants of biomedical relevance curated by NCBI dbSNP, where germline variants were detected in a median of 912 sequencing runs, and somatic variants were detected in a median of 4,876 sequencing runs, suggesting that this dataset facilitates identification of sequencing runs that harbor variants of interest. Allelic read counts obtained using a targeted alignment were very similar to read counts obtained from whole-genome alignment. Analyzing allelic read count data for matched DNA and RNA samples from tumors, we find that RNA-seq can also recover variants identified by Whole Exome Sequencing (WXS), suggesting that reprocessed allelic read counts can support variant detection across different library strategies in SRA. This study provides a rich database of known human variants across SRA samples that can support future meta-analyses of human sequence variation.
机译:序列读取存档(SRA)包含来自不同研究采用多种测序文库策略的超过一百万的公开可用的测序运行。这些数据本身含有约底层,我们利用到前所未有的规模提取等位基因读取计数基因组序列变体的信息。我们再处理超过25万的人类基因组测序运行(>原始序列数据的1000 TB数据的价值)为等位基因读取计数一个统一的数据集由NCBI的dbSNP,其中912测序的平均检测生殖细胞变异策划生物医学相关的近30万的变体运行和体细胞变体在4,876测序运行的中位数进行检测,表明此数据集有利于测序运行的识别这个海港的兴趣变种。等位基因读取使用有针对性的对齐所获得的计数是非常相似的阅读全基因组比对获得计数。从肿瘤中分析了匹配的DNA和RNA样品的等位基因读取计数的数据,我们发现,RNA-seq的也可以恢复通过全基因组测序(WXS)确定的变种,这表明再加工等位基因读取计数可以支持在不同的库策略变异检测SRA。这项研究提供跨越,可以支持人类序列变异的未来荟萃分析SRA样本丰富的已知人类的变异数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号