...
首页> 外文期刊>Cell Reports >Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins
【24h】

Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins

机译:半监督学习预测约三分之一的替代剪接异构体作为功能蛋白

获取原文
           

摘要

Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of ''exon skipping'' alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.
机译:可变剪接作用于几乎所有人类多外显子基因的转录本。尽管其无处不在,但仍未解决剪接对蛋白质表达的根本影响。形成稳定折叠的蛋白质的剪接转录本的数量和同一性仍然是引起广泛争议的来源,这主要是由于实验方法的覆盖率较低,并且因此没有阴性数据。我们通过开发一种半监督学习算法来规避此问题,该算法是用于拼接阐明的积极的无标签学习(PULSE; http://www.kimlab.org/software/pulse),它使用48个跨越各种类别的功能。我们在真实蛋白质同工型组上验证了其准确性,并直接在质谱(MS)谱上验证了其总AU-ROC为0.85。我们预测大约32%的“外显子跳跃”替代剪接事件会产生稳定的蛋白质,这表明该过程产生了大量以前未鉴定的蛋白质。我们还提供了洞察各种功能类别中阳性同工型的分布以及替代剪接的结构效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号