首页> 外文会议>2011 Seventh International Conference on Natural Computation >Automatic genotype calling of single nucleotide polymorphisms using a linear grouping algorithm
【24h】

Automatic genotype calling of single nucleotide polymorphisms using a linear grouping algorithm

机译:使用线性分组算法自动调用单核苷酸多态性的基因型

获取原文

摘要

The use of single nucleotide polymorphisms (SNPs) has become increasingly important for a wide range of genetic studies. A high-throughput genotyping technology usually involves a statistical algorithm for automatic (non-manual) genotype calling. Most calling algorithms in the literature, using methods such as k-means and mixture-models, rely on elliptical structures of the genotyping intensity data. They may fail when the intensity data have linear patterns. We propose an automatic genotype calling algorithm by further developing a linear grouping algorithm (LGA). The proposed method clusters data points around lines as opposed to around centroids. The clusters are on lines because we do not normalize the intensities. In addition, we associate a quality value, silhouette width, with each DNA sample and with each whole plate. For a data set of 101 SNPs from the TaqMan platform (Applied Biosystems), the LGA algorithm has 100% automatic calling and 93% of samples pass a quality criterion and are assigned a genotype. For a subset of 30 SNPs where validated samples are available, the accuracy for called genotypes is over 98%. Thus, a key feature of applying LGA to unnormalized TaqMan SNP assay fluorescent signals is that it is able to call automatically and realiably a substantial proportion of samples, reducing the need for manual intervention. It could be potentially adapted to other fluorescent-based SNP genotyping technologies such as Invader Assay.
机译:单核苷酸多态性(SNPs)的使用对于广泛的遗传研究变得越来越重要。高通量基因分型技术通常涉及用于自动(非手动)基因型调用的统计算法。文献中大多数调用算法(使用k均值和混合模型)都依赖于基因分型强度数据的椭圆结构。当强度数据具有线性模式时,它们可能会失败。通过进一步开发线性分组算法(LGA),我们提出了一种自动基因型调用算法。所提出的方法将数据点围绕线而不是质心进行聚类。这些簇在线,因为我们没有对强度进行归一化。此外,我们将质量值,轮廓宽度与每个DNA样品以及每个整块板相关联。对于来自TaqMan平台(Applied Biosystems)的101个SNP的数据集,LGA算法具有100%的自动调用功能,并且93%的样本通过了质量标准并被分配了基因型。对于30个SNP的子集(其中有经过验证的样本),被称为基因型的准确性超过98%。因此,将LGA应用于未归一化的TaqMan SNP分析荧光信号的关键特征在于,它能够自动,切实地调用相当大比例的样品,从而减少了人工干预的需要。它有可能适用于其他基于荧光的SNP基因分型技术,例如Invader Assay。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号