首页> 外文会议>Annual international conference on research in computational molecular biology >EALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data
【24h】

EALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data

机译:EALPS:使用可用的基因分型数据估算合并测序中的丰度水平

获取原文

摘要

The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects. A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at least some of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where not all individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkin's Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species.
机译:高通量测序技术的最新进展带来了更好地表征人类和其他生物体遗传变异的潜力。在许多情况下,无论是设计还是必要,测序程序都是在具有不同丰度的DNA样本池中执行的,其中每个样本的丰度都是未知的。在宏基因组学分析中对细菌池进行测序,或者在涉及DNA池的设计研究中自然发生这种情况。特别是,最近提出了各种合并设计,这些设计可以识别大型队列中稀有等位基因的携带者,从而大大降低了此类大规模测序项目的成本。这种用于人口研究的方法的一个基本问题是,来自池中不同个体的DNA比例的不确定性可能导致虚假关联。幸运的是,通常情况下,池中至少某些个体的基因型数据是已知的。在这里,我们提出了一种方法(eALPS),该方法将基因型数据与合并的序列数据结合使用,以便准确估计池中样品的比例,即使在并非池中所有个体都进行了基因分型的情况下(eALPS- LD)。使用来自非霍奇金淋巴瘤测序汇总研究的真实数据,我们证明了比例的估计至关重要,因为否则会存在错误发现的风险。此外,我们证明了我们的方法也适用于宏基因组学样本(eALPS-BCR)中物种的定量问题,特别适用于紧密相关物种的宏基因组定量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号