首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >The prediction of human splicing branchpoints by multi-label learning
【24h】

The prediction of human splicing branchpoints by multi-label learning

机译:基于多标签学习的人类剪接分支点预测

获取原文

摘要

human splicing branchpoints are functional elements of the alternative splicing, and the study on branchpoints can help to understand the mechanism of human pre-mRNA transcript. There are a large number of human splicing branchpoints, but the wet methods that identify branchpoints are labor-intensive and time-consuming. In this paper, we utilize machine learning techniques to build models for the human branchpoint prediction. Since an intron may have multiple branchpoints, we formulate the original problem as a multi-label learning task, which predicts branchpoint sites of introns based on the characteristics of introns. First of all, we extract a diversity of intron sequence-derived features, including sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile, and polypyrimidine tract profile. Then, taking into account efficiency and effectiveness, we adopt three methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, to build multi-label prediction models from different angles, by using intron sequence-derived features. Finally, we adopt the average scoring ensemble strategy to integrate different models, and develop the ensemble model for the branchpoint prediction. Computational experiments demonstrate that the proposed method can produce satisfying results on the experimentally verified dataset, and outperform other state-of-the-art methods. We develop a user-friendly web server for the human splicing branchpoint prediction, available at http://121.42.59.182:8080.
机译:人剪接分支是替代剪接的功能元素,对分支点的研究可以有助于理解人前mRNA转录物的机制。有大量人的剪接分支,但鉴定分支点的湿法方法是劳动密集型和耗时的。在本文中,我们利用机器学习技术来构建人类分支预测的模型。由于内含子可能具有多个分支点,因此我们将原始问题作为一种多标签学习任务,这基于内含子的特征预测内含子的分支位点。首先,我们提取内含子序列衍生特征的多样性,包括稀疏概况,二核苷酸谱,位置重量矩阵曲线,马尔可夫图序列和聚吡啶曲线曲线。然后,考虑到效率和有效性,我们采用三种方法:部分最小二乘回归,规范相关分析和正规的规范相关分析,通过使用内含子序列导出的特征来构建来自不同角度的多标签预测模型。最后,我们采用平均评分集合策略来整合不同的模型,并开发用于分支预测的集合模型。计算实验表明,所提出的方法可以在实验验证的数据集中产生令人满意的结果,并且优于其他最先进的方法。我们为人类剪接Branchpoint预测开发了一个用户友好的Web服务器,可在http://121.42.59.182:8080提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号