Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

Medina-Ortiz David; Contreras Sebastián; Quiroz Cristofer; Olivera-Nappa álvaro

首页> 外文期刊>Frontiers in Molecular Biosciences >Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

【24h】

Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

机译：高度非线性生物学，生物医学和一般数据集的监督学习预测模型的开发

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In highly non-linear datasets, attributes or features do not allow readily finding visual patterns for identifying common underlying behaviors. Therefore, it is not possible to achieve classification or regression using linear or mildly non-linear hyperspace partition functions. Hence, supervised learning models based on the application of most existing algorithms are limited, and their performance metrics are low. Linear transformations of variables, such as principal components analysis, cannot avoid the problem, and even models based on artificial neural networks and deep learning are unable to improve the metrics. Sometimes, even when features allow classification or regression in reported cases, performance metrics of supervised learning algorithms remain unsatisfyingly low. This problem is recurrent in many areas of study as, per example, the clinical, biotechnological and protein engineering areas, where many of the attributes are correlated in an unknown and very non-linear fashion or are categorical and difficult to relate to a target response variable. In such areas, being able to create predictive models would dramatically impact the quality of their outcomes, generating an immediate added value for both the scientific and general public. In this manuscript, we present RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models. The partitions obtained are statistically cross-validated, ensuring correct representativity and no over-fitting. We have successfully tested RV-Clustering in several highly non-linear datasets with different origins.

机译：在高度非线性数据集中，属性或功能不允许易于查找用于识别常见潜在行为的可视模式。因此，不可能使用线性或轻度非线性超空间分区功能来实现分类或回归。因此，基于大多数现有算法的应用的监督学习模型是有限的，它们的性能指标低。变量的线性变换，如主成分分析，无法避免问题，甚至基于人工神经网络和深度学习的模型无法改善指标。有时，即使在报告的情况下允许分类或回归，监督学习算法的性能指标仍然不满足。该问题在许多研究领域进行了复发，每种研究，每个例子，临床，生物技术和蛋白质工程领域，其中许多属性以未知和非常非线性的方式相关，或者是分类的，并且难以涉及目标反应多变的。在这些领域，能够创建预测模型将大大影响其结果的质量，为科学和一般公众产生立即增加的价值。在此稿件中，我们呈现RV群集，一个无监督的学习算法库，以及一种新的方法，旨在在高度非线性数据集中找到最佳分区，允许解构变量和臭名昭着地改善监督学习分类或回归模型中的性能指标。获得的分区是统计上交叉验证的，确保正确的表示性和没有过度拟合。我们在具有不同起源的几个高度非线性数据集中成功测试了RV群集。

著录项

来源
《Frontiers in Molecular Biosciences》 |2020年第6期|共16页
作者
Medina-Ortiz David; Contreras Sebastián; Quiroz Cristofer; Olivera-Nappa álvaro;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Highly non-linear datasetsclusteringStatistical techniquesrecursive binary methodsSupervised learning algorithms;

机译：高度非线性数据集CLusteringStatistical技术副副二进制方法化学学习算法;

相似文献

外文文献
中文文献
专利

1. Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets [J] . Vinita Periwal, Jinuraj K Rajappan, Abdul UC Jaleel, BMC research notes . 2011,第1期

机译：使用机器学习在高通量生物筛选数据集中的抗结核分子预测模型
2. HetEnc: a deep learning predictive model for multi-type biological dataset [J] . Leihong Wu, Xiangwen Liu, Joshua Xu BMC Genomics . 2019,第1期

机译：HetEnc：用于多类型生物数据集的深度学习预测模型
3. Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections [J] . Nicolás García-Pedrajas, César García-Osorio Progress in Artificial Intelligence . 2013,第1期

机译：使用遗传进化监督非线性投影促进类不平衡数据集
4. Modeling Paraphrase Identification Using Supervised Learning Methods Against Various Datasets and Features [C] . Rutal S. Mahajan, Mukesh A. Zaveri IEEE International Conference on Computational Intelligence and Computing Research . 2017

机译：使用监督学习方法对各种数据集和特征建模释义识别
5. Supervised learning-based explicit nonlinear model predictive control and unknown input estimation in biomedical systems. [D] . Chakrabarty, Ankush. 2016

机译：生物医学系统中基于监督学习的显式非线性模型预测控制和未知输入估计。
6. Development of Supervised Learning Predictive Models for Highly Non-linear Biological Biomedical and General Datasets [O] . David Medina-Ortiz, Sebastián Contreras, Cristofer Quiroz, 2020

机译：高度非线性生物生物医学和通用数据集的监督学习预测模型的开发
7. Predictive models for anti-tubercular molecules using machine learning on high-throughput biological screening datasets [O] . Vinita Periwal, Jinuraj K Rajappan, Abdul UC Jaleel, 2011

机译：在高通量生物筛选数据集上使用机器学习的抗结核分子预测模型

Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

摘要

著录项

相似文献

相关主题

期刊订阅