Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

机译：基于集合的不平衡剪接位点数据集的半监督学习方法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Producing accurate classifiers depends on the quality and quantity of labeled data. The lack of labeled data, due to its expensive generation, critically affects the application of machine learning algorithms to biological problems. However, unlabeled data may be acquired relatively faster and in larger quantities thanks to current biochemical technologies, called Next Generation Sequencing. In such cases, when the number of labeled instances is overwhelmed by the number of unlabeled instances, semi-supervised learning represents a cost-effective alternative that can improve supervised classifiers by utilizing unlabeled data. In practice, data oftentimes exhibits imbalanced class distributions, which represents an obstacle for both supervised and semi-supervised learning. The problem of supervised learning from imbalanced datasets has been extensively studied, and various solutions have been proposed to produce classifiers with optimal performance on highly skewed class distributions. In the case of semi-supervised learning, there are not as many efforts aimed at the imbalance data problem. In this paper, we study several ensemble-based semi-supervised learning approaches for predicting splice sites, a problem for which the imbalance ratio is very high. We run experiments on five imbalanced datasets with the goal of identifying which variants are the most effective.

机译：产生准确的分类器取决于标记数据的质量和数量。由于生成的数据昂贵，因此缺少标记数据会严重影响机器学习算法在生物学问题上的应用。但是，由于当前称为“下一代测序”的生化技术，未标记的数据可能会相对更快地大量获取。在这种情况下，当标记实例的数量被未标记实例的数量所淹没时，半监督学习代表了一种经济有效的替代方案，可以通过利用未标记的数据来改进监督分类器。在实践中，数据经常表现出不均衡的班级分布，这对有监督和半监督学习都构成了障碍。从不平衡数据集进行监督学习的问题已得到广泛研究，并且提出了各种解决方案以在高度偏斜的类分布上产生具有最佳性能的分类器。在半监督学习的情况下，针对不平衡数据问题的努力并不多。在本文中，我们研究了几种基于整体的半监督学习方法来预测剪接位点，该问题的不平衡率非常高。我们对五个不平衡的数据集进行了实验，目的是确定哪种变体最有效。

著录项

来源
《IEEE International Conference on Bioinformatics and Biomedicine》|2014年|432-437|共6页
会议地点
作者
Stanescu Ana; Caragea Doina;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
biology computing; data handling; learning (artificial intelligence); pattern classification; biochemical technologies; biological problems; cost-effective alternative; ensemble-based semisupervised learning; highly skewed class distributions; imbalance data problem; imbalance ratio; imbalanced class distributions; imbalanced splice site datasets; machine learning algorithms; next generation sequencing; optimal performance; supervised classifiers; unlabeled data; unlabeled instances; DNA; Organisms; Proteins; Semisupervised learning; Supervised learning; Support vector machines; Training; ensemble; imbalanced datasets; self-training; semi-supervised learning;

机译：生物学计算;数据处理;学习（人工智能）;模式分类;生化技术;生物学问题;具有成本效益的替代方案;基于集成的半监督学习;高度偏向的班级分布;不平衡数据问题;失衡比率;失衡的类别分布;不平衡的拼接站点数据集;机器学习算法;下一代排序;最佳性能;监督分类器;未标记数据;未标记实例; DNA;有机体;蛋白质;半监督学习;监督学习;支持向量机;培训;集成;不平衡数据集;自我训练;半监督学习;

相似文献

外文文献
中文文献
专利

1. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [J] . Ana Stanescu, Doina Caragea BMC Systems Biology . 2015,第SUPPLEMENTa5期

机译：基于集成的不平衡剪接位点数据集的半监督学习方法的实证研究
2. Sampling Based Approaches to Handle Imbalances in Network Traffic Dataset for Machine Learning Techniques [J] . Panjab University, India Computer Science & Information Technology . 2013,第7期

机译：基于采样的方法来处理机器学习技术的网络流量数据集中的不平衡
3. Comparison of semi-supervised and supervised approaches for classification of e-nose datasets: Case studies of tomato juices [J] . Hong Xuezhen, Wang Jun, Qi Guande Chemometrics and Intelligent Laboratory Systems . 2015,第Null期

机译：半监督和监督方法对电子鼻数据集分类的比较：番茄汁的案例研究
4. Semi-Supervised Self-training Approaches for Imbalanced Splice Site Datasets [C] . Ana Stanescu, Doina Caragea International Conference on Bioinformatics and Computational Biology . 2014

机译：半监督用于非衡性剪接站点数据集的自我培训方法
5. Active learning with support vector machines for imbalanced datasets and a method for stopping active learning based on stabilizing predictions. [D] . Bloodgood, Michael. 2009

机译：支持向量机用于不平衡数据集的主动学习，以及一种基于稳定预测的主动学习停止方法。
6. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . Ana Stanescu, Doina Caragea 2015

机译：基于整体的不平衡拼接位点数据集半监督学习方法的实证研究
7. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . 2015

机译：基于集成的不平衡剪接位点数据集的半监督学习方法的实证研究

Ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅