首页> 外文会议>IEEE International Conference on Bioinformatics Biomedicine >Semi-supervised Learning of Alternatively Spliced Exons Using Co-training
【24h】

Semi-supervised Learning of Alternatively Spliced Exons Using Co-training

机译:半监督使用共同培训的拼接外显子学习

获取原文

摘要

Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest, as it can help the understanding of transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider two sets of features for constructing views for the problem of predicting alternatively spliced exons: exonic splicing enhancers and intronic regulatory sequences. We use the Naive Bayes Multinomial algorithm as a base classifier in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone.
机译:替代剪接是一种从单个基因产生多个mRNA转录物的现象。据信,大量基因经历了替代剪接。预测替代拼接事件是一个令人兴趣的问题,因为它可以有助于了解转录物的多样性。可监督的机器学习方法可用于预测基因组水平的替代剪接事件。但是,监督方法需要大量标记的数据来学习准确的分类器。虽然通过新的测序技术产生了大量的基因组数据,但标记这些数据可能是昂贵且耗时的。因此,除了少量标记数据之外,可以利用大量未标记数据的半监督学习方法是非常理想的。在这项工作中,我们研究了半监督学习方法,共同训练的有用性,以便将外显子分类为差异或本构。共同训练算法利用数据的两个视图来迭代地学习两个可以在每个步骤中互相通信的分类器,其中它们对未标记数据的最佳预测。我们考虑了两组特征,用于构建用于预测可变剪接外显子的问题的视图:封锁剪接增强剂和内肾调节序列。我们使用Naive Bayes多项式算法作为我们研究中的基本分类器。实验结果表明,与单独标记数据量的少量标记数据中获得的那些相比,未标记数据的使用可能导致更好的分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号