首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Learning Parameter-Advising Sets for Multiple Sequence Alignment
【24h】

Learning Parameter-Advising Sets for Multiple Sequence Alignment

机译:学习用于多序列比对的参数建议集

获取原文
获取原文并翻译 | 示例

摘要

While the multiple sequence alignment output by an aligner strongly depends on the parameter values used for the alignment scoring function (such as the choice of gap penalties and substitution scores), most users rely on the single default parameter setting provided by the aligner. A different parameter setting, however, might yield a much higher-quality alignment for the specific set of input sequences. The problem of picking a good choice of parameter values for specific input sequences is called parameter advising. A parameter advisor has two ingredients: (i) a set of parameter choices to select from, and (ii) an estimator that provides an estimate of the accuracy of the alignment computed by the aligner using a parameter choice. The parameter advisor picks the parameter choice from the set whose resulting alignment has highest estimated accuracy. In this paper, we consider for the first time the problem of learning the optimal set of parameter choices for a parameter advisor that uses a given accuracy estimator. The optimal set is one that maximizes the expected true accuracy of the resulting parameter advisor, averaged over a collection of training data. While we prove that learning an optimal set for an advisor is NP-complete, we show there is a natural approximation algorithm for this problem, and prove a tight bound on its approximation ratio. Experiments with an implementation of this approximation algorithm on biological benchmarks, using various accuracy estimators from the literature, show it finds sets for advisors that are surprisingly close to optimal. Furthermore, the resulting parameter advisors are significantly more accurate in practice than simply aligning with a single default parameter choice.
机译:虽然比对仪输出的多序列比对很大程度上取决于比对评分功能所使用的参数值(例如,空位罚分和取代分数的选择),但大多数用户还是依赖于比对仪提供的单个默认参数设置。但是,对于特定的输入序列集,不同的参数设置可能会产生更高质量的比对。为特定输入序列选择合适的参数值的问题称为参数建议。参数顾问具有两个成分:(i)可供选择的一组参数选择,以及(ii)估计器,该估计器提供由对准器使用参数选择计算的对准精度的估计。参数顾问从集合中选择参数选择,其结果对齐方式具有最高的估计精度。在本文中,我们首次考虑了为使用给定精度估计器的参数顾问学习最佳参数选择集的问题。最佳设置是一种可以最大程度地提高所得参数顾问的预期真实准确度(在一组训练数据上平均)的最佳设定。虽然我们证明为顾问学习最优集是NP完全的,但我们展示了针对该问题的自然逼近算法,并证明了其逼近率的严格界限。使用文献中的各种精度估算器在生物基准上实现这种近似算法的实验表明,该算法发现的顾问集令人惊讶地接近最佳值。此外,所产生的参数顾问在实践中比简单地与单个默认参数选择对齐要精确得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号