首页> 外文学位 >Data fusion in scientific data mining.
【24h】

Data fusion in scientific data mining.

机译:科学数据挖掘中的数据融合。

获取原文
获取原文并翻译 | 示例

摘要

Data fusion involves multi-sources or multi-presentations of a single source to perform inferences which are more comprehensive and accurate than those of any single method. Thus, data fusion makes it possible to create a synergistic process in which the consolidation of individual data creates a combined resource with a productive value greater than the sum of its parts.;While considerable research has been done on data fusion in the past, most of them performed in the field of multi-sensor fusion. There has been relatively less work conducted in a data mining context. The goal of this dissertation is to develop a data fusion framework for predictive modeling, especially to the Quantitative Structure-Activity Relationship (QSAR) problems, which includes a function-oriented model, general architecture paradigms and corresponding learning algorithms. Furthermore, kernel methods, e.g. kernel partial least squares (K-PLS) ensemble with bagging and boosting is introduced as one of important decision level fusion methods. This approach can be applied to applications with multiple data sources available to obtain information of greater quality. In addition to the predefined three fusion levels, the kernel fusion method is further developed based on the properties of kernel in the feature space to take advantages of multiple physically different feature sets in order to build more accurate and robust predictive models. With Hessian-free and self-correction properties, the BFGS quasi-Newton method is employed for parameter tuning of kernel fusion.;In addition to the regression algorithm applied in data fusion scheme introduced above, we also are trying to extend K-PLS to the classification, especially when the class distribution is highly skewed or changes dramatically over time. In this case, a probabilistic classifier with the capability to deal with high dimensional discrimination is desirable. To achieve this goal, a new kernel orthonormalized PLS logistic regression (KOPLS-LR) and the corresponding ROC based adaptive threshold (ROC-BAT) approach are proposed. KOPLS-LR inherits the advantages of K-PLS and logistic regression, while ROC-BAT provides an effective solution to predict observations in which the distribution dramatically changes over time.;During the research process, a web-based modeling system has been designed and realized which integrates various learning methods, e.g. PLS, K-PLS and support vector machines (SVM). Model selection (parameter tuning) and performance estimation functionalities are also integrated in this online predictive modeling system. This predictive tool is open to the public and can be accessed at: http://reccr.chem.rpi.edu/Software/modeling/index.html
机译:数据融合涉及多个源或单个源的多个表示,以执行比任何单个方法都更全面和准确的推断。因此,数据融合使创建协同过程成为可能,在这种过程中,单个数据的合并创建了一个合并的资源,其生产价值大于其各个部分的总和。;尽管过去在数据融合方面进行了大量研究,但大多数它们中的一个在多传感器融合领域中执行。在数据挖掘环境中进行的工作相对较少。本文的目的是开发一种用于预测建模的数据融合框架,特别是针对定量结构-活动关系(QSAR)问题的框架,其中包括面向功能的模型,通用的体系结构范式和相应的学习算法。此外,内核方法例如引入带有装袋和增强的核偏最小二乘(K-PLS)集成作为重要的决策级融合方法之一。此方法可以应用于具有多个数据源的应用程序,以获得更高质量的信息。除了预定义的三个融合级别之外,还基于特征空间中的内核特性进一步开发了内核融合方法,以利用多个物理上不同的特征集的优势,以构建更准确,更可靠的预测模型。具有无Hessian和自校正特性的BFGS拟牛顿法被用于核融合的参数调整。分类,尤其是当班级分布高度偏斜或随时间变化很大时。在这种情况下,期望具有能够处理高维判别能力的概率分类器。为了实现这一目标,提出了一种新的核正交归一化PLS Logistic回归(KOPLS-LR)和相应的基于ROC的自适应阈值(ROC-BAT)方法。 KOPLS-LR继承了K-PLS和逻辑回归的优势,而ROC-BAT提供了一种有效的解决方案来预测分布随时间急剧变化的观测结果。在研究过程中,设计了基于Web的建模系统,集成了各种学习方法的实现,例如PLS,K-PLS和支持向量机(SVM)。模型选择(参数调整)和性能估计功能也集成在此在线预测建模系统中。该预测工具向公众开放,可以从以下网站访问:http://reccr.chem.rpi.edu/Software/modeling/index.html

著录项

  • 作者

    Huang, Changjian.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Statistics.;Engineering System Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:45

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号