首页> 外文学位 >Feature selection and statistical alternatives for machine learning applied to in-silico drug design.
【24h】

Feature selection and statistical alternatives for machine learning applied to in-silico drug design.

机译:用于计算机学习药物的机器学习的特征选择和统计替代方案。

获取原文
获取原文并翻译 | 示例

摘要

Feature selection has recently been the subject of intensive research in data mining, especially for datasets with a large number of descriptive attributes such as QSAR (Quantitative Activity Structure Relationship) data. QSAR is an in-silico drug design methodology, which requires identifying important features of molecules that explain a relevant drug property. A typical QSAR dataset for predicting an activity of interest is characterized by a large number of descriptive features (300–1000) for a relatively small number of compounds (molecules).; Finding the best feature subset for a given problem with N number of features requires evaluating all 2N possible subsets. The best feature subset also depends on the predictive modeling, which will be employed to predict the future unknown values of response variables of interest. Feature selection involves minimizing the number of relevant features for maximizing the predictive power of the model. From this point of view feature selection can be viewed as a special type of multi-objective optimization problem.; This dissertation proposes machine learning algorithms as predictive modeling tools for QSAR problems, and develops a novel approach for feature selection based on feature saliency. In addition, this approach is computationally less expensive than other machine learning feature selection methods (i.e., weight pruning for ANNs), and it works for any nonparametric regression algorithm.
机译:特征选择最近已成为数据挖掘中深入研究的主题,尤其是对于具有大量描述性属性的数据集,例如QSAR(定量活动结构关系)数据。 QSAR是一种 insilico 药物设计方法,它需要识别能够解释相关药物特性的分子的重要特征。典型的QSAR数据集可预测感兴趣的活性,其特征是相对少量的化合物(分子)具有大量的描述特征(300-1000个)。为具有N个特征的给定问题找到最佳特征子集,需要评估所有2 N 个可能子集。最佳特征子集还取决于预测模型,该模型将用于预测感兴趣的响应变量的未来未知值。特征选择包括最小化相关特征的数量以最大化模型的预测能力。从这个角度来看,特征选择可以看作是一种特殊的多目标优化问题。本文提出了机器学习算法作为QSAR问题的预测建模工具,并提出了一种基于特征显着性的特征选择新方法。另外,该方法在计算上比其他机器学习特征选择方法(即,对ANN的权重修剪)便宜,并且适用于任何非参数回归算法。

著录项

  • 作者

    Arciniegas, Fabio Andres.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Operations Research.; Engineering Industrial.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 250 p.
  • 总页数 250
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 运筹学;一般工业技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号