Semi-supervised Learning of Alternatively Spliced Exons Using Co-training

机译：半监督使用共同培训的拼接外显子学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Alternative splicing is a phenomenon that gives rise to multiple mRNA transcripts from a single gene. It is believed that a large number of genes undergoes alternative splicing. Predicting alternative splicing events is a problem of great interest, as it can help the understanding of transcript diversity. Supervised machine learning approaches can be used to predict alternative splicing events at genome level. However, supervised approaches require large amounts of labeled data to learn accurate classifiers. While large amounts of genomic data are produced by the new sequencing technologies, labeling these data can be costly and time consuming. Therefore, semi-supervised learning approaches that can make use of large amounts of unlabeled data, in addition to small amounts of labeled data are highly desirable. In this work, we study the usefulness of a semi-supervised learning approach, co-training, for classifying exons as alternatively spliced or constitutive. The co-training algorithm makes use of two views of the data to iteratively learn two classifiers that can inform each other, at each step, with their best predictions on the unlabeled data. We consider two sets of features for constructing views for the problem of predicting alternatively spliced exons: exonic splicing enhancers and intronic regulatory sequences. We use the Naive Bayes Multinomial algorithm as a base classifier in our study. Experimental results show that the usage of the unlabeled data can result in better classifiers as compared to those obtained from the small amount of labeled data alone.

机译：替代剪接是一种从单个基因产生多个mRNA转录物的现象。据信，大量基因经历了替代剪接。预测替代拼接事件是一个令人兴趣的问题，因为它可以有助于了解转录物的多样性。可监督的机器学习方法可用于预测基因组水平的替代剪接事件。但是，监督方法需要大量标记的数据来学习准确的分类器。虽然通过新的测序技术产生了大量的基因组数据，但标记这些数据可能是昂贵且耗时的。因此，除了少量标记数据之外，可以利用大量未标记数据的半监督学习方法是非常理想的。在这项工作中，我们研究了半监督学习方法，共同训练的有用性，以便将外显子分类为差异或本构。共同训练算法利用数据的两个视图来迭代地学习两个可以在每个步骤中互相通信的分类器，其中它们对未标记数据的最佳预测。我们考虑了两组特征，用于构建用于预测可变剪接外显子的问题的视图：封锁剪接增强剂和内肾调节序列。我们使用Naive Bayes多项式算法作为我们研究中的基本分类器。实验结果表明，与单独标记数据量的少量标记数据中获得的那些相比，未标记数据的使用可能导致更好的分类器。

著录项

来源
《IEEE International Conference on Bioinformatics Biomedicine》|2011年||共4页
会议地点
作者
Tangirala Karthik; Caragea Doina;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q18-53;
关键词
alternative splicing; alternatively spliced and constitutive exons; co-training; semi-supervised learning;

机译：替代拼接;或者拼接和构成外显子;共同训练;半监督学习;

相似文献

外文文献
中文文献
专利

1. Predicting alternatively spliced exons using semi-supervised learning [J] . Stanescu Ana, Tangirala Karthik, Caragea Doina International journal of data mining and bioinformatics . 2016,第1期

机译：使用半监督学习预测可变剪接外显子
2. Semi-supervised learning combining co-training with active learning [J] . Yihao Zhang, Junhao Wen, Xibin Wang, Expert Systems with Application . 2014,第5期

机译：半监督学习，将联合训练与主动学习相结合
3. Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins [J] . Yanqi Hao, Recep Colak, Joan Teyra, Cell Reports . 2015,第2期

机译：半监督学习预测约三分之一的替代剪接异构体作为功能蛋白
4. Semi-supervised Learning of Alternatively Spliced Exons Using Co-training [C] . Tangirala Karthik, Caragea Doina 2011 IEEE International Conference on Bioinformatics and Biomedicine . 2011

机译：使用联合训练的替代拼接外显子的半监督学习
5. Exosomal proteome changes induced by human tau expression are modulated by the presence of the microtubule-binding domain and alternatively spliced exons 2-3 in SH-SY5Y neuroblastoma cells [D] . Aladwan, Mohammad Mahmoud. 2017

机译：SH-SY5Y神经母细胞瘤细胞中微管结合结构域的存在和其他剪接的外显子2-3的存在可以调节人tau表达诱导的外泌体蛋白质组变化
6. Alternative splicing in the human cytochrome P450IIB6 gene: use of a cryptic exon within intron 3 and splice acceptor site within exon 4. [O] . J S Miles, A W McLaren, F J Gonzalez, 1990

机译：人类细胞色素P450IIB6基因的替代剪接：使用内含子3内的隐性外显子和外显子4内的剪接受体位点。
7. An exonic splicing enhancer in human IGF-I pre-mRNA mediates recognition of alternative exon 5 by the serine-arginine protein splicing factor-2/alternative splicing factor [O] . Smith P. J., Spurrell E. L., Coakley J., 2002

机译：人类IGF-I pre-mRNA中的外显子剪接增强子通过丝氨酸-精氨酸蛋白剪接因子-2 /其他剪接因子介导对备选外显子5的识别

Semi-supervised Learning of Alternatively Spliced Exons Using Co-training

摘要

著录项

相似文献

相关主题

期刊订阅