首页> 外文期刊>Bioinformatics >A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data
【24h】

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data

机译:支持向量机,用于从下一代测序数据中鉴定单核苷酸多态性

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants. Results:To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional trainingdata. Availability: Source code and executables are available from github. com/brendanofallon/SNPSVM/
机译:动机:从下一代测序数据中准确确定单核苷酸多态性(SNP)是生物信息学研究人员面临的重大挑战。大多数当前的方法使用机械模型,这些机械模型假定从二项式分布中采样与给定参考位置对齐的核苷酸。尽管此类方法很敏感,但它们通常无法区分由于未对齐的读数,测序错误或平台工件与真实变异而导致的错误。结果:为了实现更准确的SNP调用,我们开发了一种算法,该算法使用训练有素的支持向量机(SVM)从.BAM或.SAM格式的序列读取比对中确定变异。我们的基于SVM的实现确定SNP的敏感性和特异性比其他平台(包括Genome Analysis Toolkit,samtools和FreeBayes随附的UnifiedGenotyper)高得多。此外,与基因组分析工具包所产生的质量得分相比,我们的实施所产生的质量得分更准确地反映了变体是真实的可能性。虽然结果取决于所使用的模型,但该实现包含一些工具,这些工具可以轻松构建新模型并使用其他培训数据来完善现有模型。可用性:源代码和可执行文件可从github获得。 com / brendanofallon / SNPSVM /

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号