...
首页> 外文期刊>BMC Bioinformatics >Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
【24h】

Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods

机译:通过结合序列派生的特征和多标签学习方法预测人类剪接分支点

获取原文
           

摘要

Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
机译:选择性剪接是单个基因编码中的关键过程,该基因去除内含子并连接外显子,并且剪接分支点是替代剪接的指示剂。湿实验已经确定了许多人类剪接分支点,但是许多分支点仍然未知。为了指导湿实验,我们开发了预测人类剪接分支点的计算方法。考虑到内含子可能具有多个分支点这一事实,我们将分支点预测转换为多标签学习问题,并尝试从内含子序列预测分支点位点。首先,我们研究了多种内含子序列衍生的特征,例如稀疏谱,二核苷酸谱,位置权重矩阵谱,马尔可夫基序谱和聚嘧啶谱。其次,我们考虑了几种多标签学习方法:偏最小二乘回归,规范相关分析和正则规范相关分析,并将它们用作基本分类引擎。第三,我们提出了两种集成学习方案,它们集成了不同的特征和不同的分类器,以建立用于分支点预测的集成学习系统。一种是基于遗传算法的加权平均集成法。另一种是基于逻辑回归的集成方法。在计算实验中,两种集成学习方法优于基准分支点预测方法,并且可以在基准数据集上产生高精度结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号