Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (>Probabilist>ic Ma>ximum >Accuracy >Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: ∼bjyoon/picxaa/.
展开▼
机译:精确的多序列比对(MSA)工具对于生物学序列的功能和结构的比较研究至关重要。然而,开发一种能够在各种类型的序列集上一致地预测准确比对的计算效率高的算法是非常具有挑战性的。在本文中,我们介绍PicXAA(> P strong> robabilist > ic strong> Ma > x strong> imum > A strong> cureacy > A strong> lignment),一种概率性非渐进比对算法,旨在寻找具有最大预期准确性的蛋白质比对。 PicXAA贪婪地从具有高局部相似性的序列区域建立多重比对,从而产生准确的全局比对,从而有效地把握序列之间的局部相似性。对几种广泛使用的基准集的评估表明,PicXAA不断在各种参考集上产生准确的比对结果,与具有局部相似性的序列集上的其他领先算法相比,具有特别显着的改进。 PicXAA的源代码可从以下网址免费获得:〜bjyoon / picxaa /。
展开▼