首页> 外文会议>International workshop on algorithms in bioinformatics >Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising
【24h】

Predicting Core Columns of Protein Multiple Sequence Alignments for Improved Parameter Advising

机译:预测蛋白质多序列比对的核心栏以改善参数建议

获取原文

摘要

In a computed protein multiple sequence alignment, the core-ness of a column is the fraction of its substitutions that axe in so-called core columns of the gold-standard reference alignment of its proteins. In benchmaxk suites of protein reference alignments, the core columns of the reference are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the folded three-dimensional structures of the proteins. When computing a protein multiple sequence alignment in practice, a reference alignment is not known, so its coreness can only be predicted. We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment's accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner's scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.
机译:在计算的蛋白质多序列比对中,一列的核心强度是其取代的分数,即其蛋白质的金标准参照比对的所谓核心列中的ax。在Benchmaxk蛋白质参考比对套件中,参考的核心列是可以确信地标记为正确的那些,通常是由于列中的所有残基在蛋白质折叠的三维结构的空间叠加中都足够接近。实际上,在计算蛋白质多序列比对时,参考比对是未知的,因此只能预测其核心。我们首次开发了蛋白质多序列比对的柱芯预测因子。这使我们能够预测所计算的路线的哪些列是核心,从而更好地估算路线的准确性。我们预测核心性的方法类似于机器学习中的最近邻分类,不同之处在于,我们通过回归函数将最近邻距离转换为核心预测,并通过新的优化公式学习了合适的距离函数,从而解决了大规模线性规划问题。我们将核心预测器应用于参数建议,即为比对工具的评分功能选择参数值的任务,以获得特定序列集的更准确比对。我们表明,对于此任务,我们的预测器性能明显优于文献中的其他列置信度估计器,并显着提高了对齐精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号