Kernel partial least squares (K-PLS) for scientific data mining .

机译：用于科学数据挖掘的内核偏最小二乘（K-PLS）。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of this dissertation is the use of kernel partial least squares (K-PLS) for scientific data mining. K-PLS is a machine learning technique that applies the kernel trick to partial least squares, a statistical technique commonly used for collinear data problems in chemometrics and drug design. It can be shown that K-PLS is closely related to modern machine learning techniques such as support vector machines and can also be interpreted as a neural network. Learning is a broad concept and can commonly be divided in 4 complex systems tasks: (1) Problem representation, (2) Data preprocessing, (3) Predictive modeling, and (4) Variable and Feature selection. Each of these components contributes to model transparency and prediction performance.; For the preprocessing part, a basic data transformation technique, Principal Component Analysis (PCA), has been extended to Independent Components Analysis (ICA). The ICA Transform (ICAT) and ICA based data cleansing have been introduced. In addition, a novel kernel centering algorithm has been introduced.; In the machine learning part, SUpport vector Parsimonious ANOVA (SUPANOVA) transparent (reversible) spline kernel has been implemented to improve the causality analysis of the model. The proposed new spline kernel has also been integrated into the K-PLS framework. K-PLS algorithm has also been extended so that it can be implemented with any loss function for multiple responses. Additionally, Renyi's quadratic entropy loss function has been used to deal with unbalanced classification problems.; Two new variable selection algorithms have been introduced in this thesis: (1) Feature selection based on sigma-tuning of the Gaussian kernel, and (2) Random Forests feature selection. These variable selection methods have been demonstrated on benchmark data sets and compared with other feature selection methods based on sensitivity analysis and Z-scores.; Finally, these methodologies have been applied to three different scientific data mining problems: (1) Predicting ischemia from magnetocardiogram data; (2) Quantitative Structure-Activity Relationship (QSAR) drug design for the discovery of novel pharmaceuticals; and (3) Identification of trace materials from terahertz spectra.

机译：本文的目的是将核偏最小二乘（K-PLS）用于科学数据挖掘。 K-PLS是一种将内核技巧应用于部分最小二乘的机器学习技术，这是一种统计技术，通常用于化学计量学和药物设计中的共线数据问题。可以证明，K-PLS与现代机器学习技术（例如支持向量机）密切相关，并且也可以解释为神经网络。学习是一个广泛的概念，通常可以分为4个复杂的系统任务：（1）问题表示，（2）数据预处理，（3）预测建模和（4）变量和特征选择。这些组件中的每一个都有助于模型的透明度和预测性能。对于预处理部分，基本数据转换技术主成分分析（PCA）已扩展到独立成分分析（ICA）。引入了ICA转换（ICAT）和基于ICA的数据清理。此外，还介绍了一种新颖的内核居中算法。在机器学习部分，已实现了支持向量简约ANOVA（SUPANOVA）透明（可逆）样条内核，以改善模型的因果关系分析。拟议的新样条内核也已集成到K-PLS框架中。 K-PLS算法也得到了扩展，因此可以用任何损失函数实现多个响应。此外，人一的二次熵损失函数已用于处理不平衡的分类问题。本文引入了两种新的变量选择算法：（1）基于高斯核的sigma-tuning的特征选择；（2）Random Forests特征选择。这些变量选择方法已在基准数据集上得到证明，并与其他基于敏感性分析和Z评分的特征选择方法进行了比较。最后，这些方法已应用于三个不同的科学数据挖掘问题：（1）从心电图数据预测缺血；（2）用于发现新药物的定量构效关系（QSAR）药物设计；（3）从太赫兹光谱中鉴定痕量物质。

著录项

作者
Han, Long.;
展开▼
作者单位

Rensselaer Polytechnic Institute.;

展开▼
授予单位 Rensselaer Polytechnic Institute.;
学科 Engineering Industrial.; Operations Research.
学位 Ph.D.
年度 2007
页码 151 p.
总页数 151
原文格式 PDF
正文语种 eng
中图分类一般工业技术;运筹学;
关键词

相似文献

外文文献
中文文献
专利

1. Classification of metabolites with kernel-partial least squares (K-PLS). [J] . Embrechts MJ, Ekins S Drug Metabolism and Disposition: The Biological Fate of Chemicals . 2007,第3期

机译：核仁最小二乘法（K-PLS）对代谢物进行分类。
2. Vapour Pressure of Atmospheric Nanoparticles Using Genetic Algorithm-Partial Least Squares and Genetic Algorithm - Kernel Partial Least Squares [J] . Hadi Noorizadeh, Abbas Farmany Asian Journal of Chemistry: An International Quarterly Research Journal of Chemistry . 2012,第1期

机译：遗传算法-偏最小二乘和遗传算法-内核偏最小二乘法在大气纳米粒子的蒸气压中的应用
3. Novel Kernel Orthogonal Partial Least Squares for Dominant Sensor Data Extraction [J] . Chen Bo-Wei Quality Control, Transactions . 2020,第期

机译：用于主导传感器数据提取的新型核正交部分最小二乘
4. Combining partial least squares regression and least squares support vector machine for data mining [C] . Gaobo Chen, Xiufang Chen 2011 International Conference on E-Business and E-Government . 2011

机译：结合偏最小二乘回归和最小二乘支持向量机进行数据挖掘
5. Scientific visualization and data mining for massive scientific datasets. [D] . Sharma, Ashish. 2005

机译：科学可视化和大量科学数据集的数据挖掘。
6. Kernelized partial least squares for feature reduction and classification of gene microarray data [O] . Walker H Land, Xingye Qiao, Daniel E Margolis, 2011

机译：核化的偏最小二乘用于特征减少和基因芯片数据分类
7. Kernel Partial Least Square Regression with High Resistance to Multiple Outliers and Bad Leverage Points on Near-Infrared Spectral Data Analysis [O] . Divo Dharma Silalahi, Habshah Midi, Jayanthi Arasan, 2021

机译：核心偏最小二乘回归具有高抗性对多个异常值和近红外光谱数据分析的不良杠杆点
8. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space [R] . Rosipal, R., Trejo, L. J. 2001

机译：再生核Hilbert空间中的核偏最小二乘回归

Kernel partial least squares (K-PLS) for scientific data mining .

摘要

著录项

相似文献

相关主题

期刊订阅