...
首页> 外文期刊>Genome Research >A probabilistic approach for SNP discovery in high-throughput human resequencing data
【24h】

A probabilistic approach for SNP discovery in high-throughput human resequencing data

机译:在高通量人类重测序数据中发现SNP的概率方法

获取原文
获取原文并翻译 | 示例

摘要

New high-throughput sequencing technologies are generating large amounts of sequence data, allowing the development of targeted large-scale resequencing studies. For these studies, accurate identification of polymorphic sites is crucial. Heterozygous sites are particularly difficult to identify, especially in regions of low coverage. We present a new strategy for identifying heterozygous sites in a single individual by using a machine learning approach that generates a heterozygosity score for each chromosomal position. Our approach also facilitates the identification of regions with unequal representation of two alleles and other poorly sequenced regions. The availability of confidence scores allows for a principled combination of sequencing results from multiple samples. We evaluate our method on a gold standard data genotype set from HapMap. We are able to classify sites in this data set as heterozygous or homozygous with 98.5% accuracy. In de novo data our probabilistic heterozygote detection (“ProbHD”) is able to identify 93% of heterozygous sites at a 99.9% overall agreement for genotype calls and close to 90% agreement for heterozygote calls. Overall, our data indicate that high-throughput resequencing of human genomic regions requires careful attention to systematic biases in sample preparation as well as sequence contexts, and that their impact can be alleviated by machine learning-based sequence analyses allowing more accurate extraction of true DNA variants.
机译:新的高通量测序技术正在产生大量的序列数据,从而可以开发有针对性的大规模重测序研究。对于这些研究,准确鉴定多态性位点至关重要。杂合位点特别难以识别,尤其是在覆盖率较低的地区。我们提出了一种通过使用机器学习方法来识别单个个体中杂合位点的新策略,该方法会为每个染色体位置生成杂合度得分。我们的方法还有助于鉴定具有两个等位基因和其他序列较差区域的不等代表的区域。置信度得分的可用性可对多个样品的测序结果进行原则性组合。我们根据HapMap的金标准数据基因型评估了我们的方法。我们能够以98.5%的准确度将数据集中的位点分类为杂合或纯合。在从头数据中,我们的概率杂合子检测(ProbHD)能够以99.9%的基因型调用总体一致性和近90%的杂合子调用一致性识别93%的杂合位点。总的来说,我们的数据表明,人类基因组区域的高通量重测序需要仔细注意样品制备以及序列背景方面的系统性偏见,并且可以通过基于机器学习的序列分析来减轻其影响,从而更准确地提取真实的DNA变体。

著录项

  • 来源
    《Genome Research》 |2009年第9期|1542-1552|共11页
  • 作者单位

    McGill Centre for Bioinformatics, McGill University, Montréal H36 0B1, Canada|School of Computer Sciences, McGill University, Montréal H3A 2T5, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada;

    McGill Centre for Bioinformatics, McGill University, Montréal H36 0B1, Canada|School of Computer Sciences, McGill University, Montréal H3A 2T5, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada|Department of Human Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada|Department of Medical Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada|Department of Human Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada|Department of Medical Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada|Department of Human Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada|Department of Medical Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada;

    McGill Centre for Bioinformatics, McGill University, Montréal H36 0B1, Canada|School of Computer Sciences, McGill University, Montréal H3A 2T5, Canada;

    McGill University and Genome Québec Innovation Centre, Montréal H36 1A4, Canada|Department of Human Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada|Department of Medical Genetics, McGill University Health Centre (MUHC), McGill University, Montréal H36 1A4, Canada;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号