首页> 外文会议>International Conference on Research in Computational Molecular Biology >eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data
【24h】

eALPS: Estimating Abundance Levels in Pooled Sequencing Using Available Genotyping Data

机译:eAlps:使用可用的基因分型数据估算汇总测序中的丰度水平

获取原文

摘要

The recent advances in high-throughput sequencing technologies bring the potential of a better characterization of the genetic variation in humans and other organisms. In many occasions, either by design or by necessity, the sequencing procedure is performed on a pool of DNA samples with different abundances, where the abundance of each sample is unknown. Such a scenario is naturally occurring in the case of metagenomics analysis where a pool of bacteria is sequenced, or in the case of population studies involving DNA pools by design. Particularly, various pooling designs were recently suggested that can identify carriers of rare alleles in large cohorts, dramatically reducing the cost of such large-scale sequencing projects.A fundamental problem with such approaches for population studies is that the uncertainly of DNA proportions from different individuals in the pools might lead to spurious associations. Fortunately, it is often the case that the genotype data of at leastsome of the individuals in the pool is known. Here, we propose a method (eALPS) that uses the genotype data in conjunction with the pooled Sequence data in order to accurately estimate the proportions of the samples in the pool, even in cases where notall individuals in the pool were genotyped (eALPS-LD). Using real data from a sequencing pooling study of Non-Hodgkin's Lymphoma, we demonstrate that the estimation of the proportions is crucial, since otherwise there is a risk for false discoveries. Additionally, we demonstrate that our approach is also applicable to the problem of quantification of species in metagenomics samples (eALPS-BCR), and is particularly suitable for metagenomic quantification of closely-related species.
机译:最近的高通量测序技术的进展使得能够更好地表征人类和其他生物的遗传变异。在许多场合,通过设计或必要性,测序过程是在具有不同丰度的DNA样本中进行测序过程,其中每个样品的丰度未知。在比赛组织分析的情况下,这种情况是天然存在的,其中细菌池被测序,或者在涉及DNA池的人口研究的情况下。特别地,最近提出了各种汇集设计,可以识别大群组中罕见等位基因的载体,大大降低了这种大规模测序项目的成本。人口研究方法的基本问题是来自不同个人的DNA比例的不确定问题在游泳池可能会导致杂散的关联。幸运的是,往往是泳池中的至少一个个体的基因型数据是已知的。在这里,我们提出了一种使用基因型数据与汇总序列数据结合使用基因型数据的方法(EALP),以便准确估计池中样品的比例,即使在池中的个体进行基因分型的情况下(EALPS-LD) )。使用来自非霍奇金淋巴瘤的测序汇集研究的真实数据,我们证明了比例的估计至关重要,否则否则存在虚假发现的风险。此外,我们证明我们的方法也适用于肉桂组合样品(EALPS-BCR)中物种的定量问题,并且特别适用于密切相关的物种的均衡量化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号