首页> 外文会议>International Joint Conference on Neural Networks >A hierarchical learning approach to calibrate allele frequencies for SNP based genotyping of DNA pools
【24h】

A hierarchical learning approach to calibrate allele frequencies for SNP based genotyping of DNA pools

机译:基于SNP的DNA库基因分型的等位基因频率校准的分层学习方法

获取原文

摘要

The combination of low density SNP arrays and DNA pooling is a fast and cost effective approach to genotyping that opens up basic genomics to a range of new applications and studies. However we have identified significant limitations in the existing approach to calculating allele frequencies with DNA pooling. These limitations include a reduced ability to deal with SNP to SNP variation via the standard interpolation method. Our contribution is a new hierarchical learning framework which resolves these drawbacks. The framework involves a hierarchy of two greedily trained layers of learners. The first layer learns the bias of each SNP then applies a calibration to reduce SNP bias by mapping into a common coordinate system across all SNPs. The second layer learns an allele frequency function exploiting the global SNP data. A range of algorithms have been applied including linear regression, neural network and support vector regression. The framework has been tested on pooled samples of Black Tiger prawns that have been genotyped with low density Sequenom iPLEX panels. Analysis of pooled samples and the corresponding individually genotyped SNP samples indicate the pooling approach introduces an allele frequency RMS error of 0.12. The existing calibration approach corrects ∼14% of the error. Our hierarchical approach is 4.5 times as effective by correcting for ∼64% of the introduced error. This is a significant reduction and has the potential to enable genetic studies previously not possible due to allele frequency error. Although testing so far is limited to low density SNP arrays the approach was developed to generalize to other SNP genotyping technologies.
机译:低密度SNP阵列和DNA池的结合是一种快速,经济高效的基因分型方法,为许多新的应用和研究提供了基本的基因组学方法。但是,我们已经发现在现有的利用DNA池计算等位基因频率的方法中存在明显的局限性。这些限制包括通过标准插值方法处理SNP到SNP变异的能力降低。我们的贡献是解决了这些缺点的新的分层学习框架。该框架包括两个经过贪婪训练的学习者层次结构。第一层了解每个SNP的偏差,然后应用校准以通过映射到所有SNP的公共坐标系中来减少SNP偏差。第二层学习利用全局SNP数据的等位基因频率函数。已经应用了一系列算法,包括线性回归,神经网络和支持向量回归。该框架已在使用低密度Sequenom iPLEX面板进行基因分型的黑虎虾的合并样本中进行了测试。对合并样本和相应的单独基因分型SNP样本的分析表明,合并方法引入了等位基因频率RMS误差0.12。现有的校准方法可校正约14%的误差。通过修正约64%的引入误差,我们的分层方法的效力是其4.5倍。由于等位基因频率错误,这是一个显着的降低,并且有可能进行以前不可能进行的遗传研究。尽管迄今为止的测试仅限于低密度SNP阵列,但该方法已被开发为可推广到其他SNP基因分型技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号