首页> 外文期刊>Journal of Bioinformatics and Computational Biology >SAMSVM: A tool for misalignment filtration of SAM-format sequences with support vector machine
【24h】

SAMSVM: A tool for misalignment filtration of SAM-format sequences with support vector machine

机译:SAMSVM:使用支持向量机对SAM格式序列进行错位过滤的工具

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Sequence alignment/map (SAM) formatted sequences [Li H, Handsaker B, Wysoker A et al., Bioinformatics 25(16): 2078-2079, 2009.] have taken on a main role in bioinformatics since the development of massive parallel sequencing. However, because misalignment of sequences poses a significant problem in analysis of sequencing data that could lead to false positives in variant calling, the exclusion of misaligned reads is a necessity in analysis. In this regard, the multiple features of SAM-formatted sequences can be treated as vectors in a multi-dimension space to allow the application of a support vector machine (SVM). Applying the LIBSVM tools developed by Chang and Lin [Chang C-C, Lin C-J, ACM Trans Intell Syst Technol 2:1-27, 2011.] as a simple interface for support vector classification, the SAMSVM package has been developed in this study to enable misalignment filtration of SAM-formatted sequences. Cross-validation between two simulated datasets processed with SAMSVM yielded accuracies that ranged from 0.89 to 0.97 with F-scores ranging from 0.77 to 0.94 in 14 groups characterized by different mutation rates from 0.001 to 0.1, indicating that the model built using SAMSVM was accurate in misalignment detection. Application of SAMSVM to actual sequencing data resulted in filtration of misaligned reads and correction of variant calling.
机译:序列比对/图(SAM)格式化的序列[Li H,Handsaker B,Wysoker A等,Bioinformatics 25(16):2078-2079,2009.]自大规模并行测序技术发展以来,在生物信息学中已发挥了主要作用。 。但是,由于序列的错位在测序数据的分析中会引起严重的问题,可能导致变异调用中出现假阳性,因此在分析中必须排除错误的读段。在这方面,可以将SAM格式序列的多个特征视为多维空间中的向量,以允许应用支持向量机(SVM)。应用由Chang和Lin开发的LIBSVM工具[Chang CC,Lin CJ,ACM Trans Intell Syst Technol 2:1-27,2011.]作为支持向量分类的简单接口,在本研究中开发了SAMSVM软件包以实现SAM格式序列的错位过滤。用SAMSVM处理的两个模拟数据集之间的交叉验证产生的精度在0.89至0.97范围内,其中14个组的F分数在0.77至0.94之间,特征在于0.001至0.1的不同突变率,这表明使用SAMSVM建立的模型在错位检测。 SAMSVM在实际测序数据中的应用导致未对齐读段的过滤和变体检出的校正。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号