...
首页> 外文期刊>Frontiers in Molecular Biosciences >Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets
【24h】

Development of Supervised Learning Predictive Models for Highly Non-linear Biological, Biomedical, and General Datasets

机译:高度非线性生物学,生物医学和一般数据集的监督学习预测模型的开发

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In highly non-linear datasets, attributes or features do not allow readily finding visual patterns for identifying common underlying behaviors. Therefore, it is not possible to achieve classification or regression using linear or mildly non-linear hyperspace partition functions. Hence, supervised learning models based on the application of most existing algorithms are limited, and their performance metrics are low. Linear transformations of variables, such as principal components analysis, cannot avoid the problem, and even models based on artificial neural networks and deep learning are unable to improve the metrics. Sometimes, even when features allow classification or regression in reported cases, performance metrics of supervised learning algorithms remain unsatisfyingly low. This problem is recurrent in many areas of study as, per example, the clinical, biotechnological and protein engineering areas, where many of the attributes are correlated in an unknown and very non-linear fashion or are categorical and difficult to relate to a target response variable. In such areas, being able to create predictive models would dramatically impact the quality of their outcomes, generating an immediate added value for both the scientific and general public. In this manuscript, we present RV-Clustering, a library of unsupervised learning algorithms, and a new methodology designed to find optimum partitions within highly non-linear datasets that allow deconvoluting variables and notoriously improving performance metrics in supervised learning classification or regression models. The partitions obtained are statistically cross-validated, ensuring correct representativity and no over-fitting. We have successfully tested RV-Clustering in several highly non-linear datasets with different origins.
机译:在高度非线性数据集中,属性或功能不允许易于查找用于识别常见潜在行为的可视模式。因此,不可能使用线性或轻度非线性超空间分区功能来实现分类或回归。因此,基于大多数现有算法的应用的监督学习模型是有限的,它们的性能指标低。变量的线性变换,如主成分分析,无法避免问题,甚至基于人工神经网络和深度学习的模型无法改善指标。有时,即使在报告的情况下允许分类或回归,监督学习算法的性能指标仍然不满足。该问题在许多研究领域进行了复发,每种研究,每个例子,临床,生物技术和蛋白质工程领域,其中许多属性以未知和非常非线性的方式相关,或者是分类的,并且难以涉及目标反应多变的。在这些领域,能够创建预测模型将大大影响其结果的质量,为科学和一般公众产生立即增加的价值。在此稿件中,我们呈现RV群集,一个无监督的学习算法库,以及一种新的方法,旨在在高度非线性数据集中找到最佳分区,允许解构变量和臭名昭着地改善监督学习分类或回归模型中的性能指标。获得的分区是统计上交叉验证的,确保正确的表示性和没有过度拟合。我们在具有不同起源的几个高度非线性数据集中成功测试了RV群集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号