首页> 外文会议>2011 IEEE International Conference on Bioinformatics and Biomedicine >Semi-supervised Learning of Alternatively Spliced Exons Using Co-training
【24h】

Semi-supervised Learning of Alternatively Spliced Exons Using Co-training

机译:使用联合训练的替代拼接外显子的半监督学习

获取原文

摘要

Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest, as it can help the understanding of transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider two sets of features for constructing views for the problem of predicting alternatively spliced exons: exonic splicing enhancers and intronic regulatory sequences. We use the Naive Bayes Multinomial algorithm as a base classifier in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone.
机译:选择性剪接是从单个基因产生多个mRNA转录物的现象。据信大量基因经历了可变剪接。预测替代剪接事件是一个非常令人感兴趣的问题,因为它可以帮助理解转录本多样性。有监督的机器学习方法可用于预测基因组水平的可变剪接事件。然而,有监督的方法需要大量的标记数据来学习准确的分类器。尽管新的测序技术可产生大量的基因组数据,但标记这些数据可能既昂贵又耗时。因此,非常需要除了少量标记数据之外还可以使用大量未标记数据的半监督学习方法。在这项工作中,我们研究了半监督学习方法(共训练)对于将外显子分类为可剪接的或本构的分类的有用性。协同训练算法利用数据的两个视图来迭代学习两个分类器,这些分类器可以在每个步骤相互告知,它们对未标记数据的最佳预测。我们考虑了两种功能来构建预测选择性剪接外显子的观点:外显子剪接增强子和内含子调控序列。我们在研究中使用朴素贝叶斯多项式算法作为基础分类器。实验结果表明,与仅从少量标记数据获得的分类器相比,未标记数据的使用可产生更好的分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号