首页> 外文期刊>Journal of Bioinformatics and Computational Biology >ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system
【24h】

ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system

机译:ProgSio-MSA:基于渐进的单迭代优化框架,用于使用有效评分系统进行多序列对齐

获取原文
获取原文并翻译 | 示例
           

摘要

Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model's efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.
机译:对准两个以上的生物序列被称为多个序列对准(MSA)。为了分析生物序列,MSA是具有系统发育,同源标志物,蛋白质结构预测,基因调控和药物发现中的潜在应用的主要活动之一。 MSA问题被视为NP-Tress。此外,随着下一代测序技术的推进,所有基因和蛋白质数据库均由大量的原始序列数据始终加载,这既不是分析也不注释。为了分析这些生长的原始序列,需要具有精确对准的计算上有效(多项式)模型很高。在该研究中,提出了一种基于渐进的对齐模型,名为Progsio-MSA,其包括有效的评分系统和优化框架。所提出的评分系统使用两个评分策略的组合有效地对准序列,即向前看,该组合基于先前位置的状态信息来动态地评分残留对,以改善对成分和定位 - 残留特定的动态差距罚款,在残留物及其位置信息的基础上动态惩罚使用突变矩阵的差距。所提出的单迭代优化(SIO)框架识别并优化本地Optima陷阱以提高对准质量。在两个基准数据集中,评估所提出的模型,即在两个基准数据集上,即Balibase和Sabmark。所提出的模型的对准质量(生物学精度)在Balibase数据集中增加了17.7%的因子。使用时间复杂性以及运行时分析将所提出的模型的效率与最先进的模型进行比较。 Wilcoxon签名级别统计测试结果得出结论认为,所提出的模型的质量显着优于基于渐进式的最新模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号