首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Learning Scoring Schemes for Sequence Alignment from Partial Examples
【24h】

Learning Scoring Schemes for Sequence Alignment from Partial Examples

机译:从部分示例中学习序列比对的评分方案

获取原文
获取原文并翻译 | 示例

摘要

When aligning biological sequences, the choice of scoring scheme is critical. Even small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to learn parameter values that are appropriate for biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct reference alignments, this is the problem of finding parameter values that make the scores of the reference alignments be as close as possible to those of optimal alignments of their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the reference alignment is not specified, and to an improved formulation based on minimizing the average error between the scores of the reference alignments and the scores of optimal alignments. Experiments on benchmark biological alignments show we can learn scoring schemes that generalize across protein families, and that boost the accuracy of multiple sequence alignment by as much as 25 percent.
机译:比对生物学序列时,评分方案的选择至关重要。例如,即使是空位罚金的微小变化,也会产生根本不同的比对。学习适用于生物序列的参数值的一种严格方法是通过反参数序列比对。给定生物学上正确的参考比对的示例集合,这是寻找使参考比对的得分尽可能接近其序列的最佳比对的参数值的问题。我们将反参数比对的现有工作扩展到部分示例(其中未指定参考比对的区域),以及基于最小化参考比对得分与最佳比对得分之间的平均误差的改进公式。基准生物学比对的实验表明,我们可以学习在蛋白质家族中普遍适用的评分方案,并且可以将多序列比对的准确性提高多达25%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号