首页> 外文学位 >Evolutionary computing for feature selection and predictive data mining.
【24h】

Evolutionary computing for feature selection and predictive data mining.

机译:用于特征选择和预测数据挖掘的进化计算。

获取原文
获取原文并翻译 | 示例

摘要

Feature selection has recently been the subject of intensive research in data mining, especially for datasets with a large number of descriptive attributes such as QSAR (Quantitative Activity Structure Relationship) data. QSAR is an in-silico drug design methodology, which requires identifying important features of molecules that explain a drug relevant activity of interest. A typical QSAR dataset for predicting an activity of interest is characterized by a large number of descriptive features (300–1000) for a relatively small number of compounds (typically around 50–500).; Finding the best feature subset for a given problem with N number of features requires evaluating all 2N possible subsets. The best feature subset also depends on the predictive modeling, which will be employed to predict the future unknown values of response variables of interest. Feature selection involves minimizing the number of relevant features for maximizing the predictive power of the model. From this point of view feature selection can be viewed as a special type of multi-objective optimization problem.; Evolutionary computing can be applied to problems where traditional methods are hard to apply or lead to unsatisfactory solutions (e.g. local optima). The methods of evolutionary computation are stochastic and their search methods imitate and model some phenomena from nature and evolution: (i) the survival of the fittest and (ii) genetic inheritance. This dissertation addresses evolutionary algorithms for feature selection and predictive modeling for QSAR data sets.
机译:特征选择最近已成为数据挖掘中深入研究的主题,尤其是对于具有大量描述性属性的数据集,例如QSAR(定量活动结构关系)数据。 QSAR是一种计算机内药物设计方法,它需要确定分子的重要特征来解释感兴趣的药物相关活性。典型的QSAR数据集可预测感兴趣的活性,其特征是相对少量化合物(通常为50-500个左右)具有大量描述特征(300-1000个)。为具有N个特征的给定问题找到最佳特征子集,需要评估所有2 N 个可能子集。最佳特征子集还取决于预测模型,该模型将用于预测感兴趣的响应变量的未来未知值。特征选择包括最小化相关特征的数量以最大化模型的预测能力。从这个角度来看,特征选择可以看作是一种特殊的多目标优化问题。进化计算可以应用于传统方法难以应用或导致解决方案不能令人满意的问题(例如局部最优)。进化计算的方法是随机的,其搜索方法模仿和模拟了自然和进化中的某些现象:(i)优胜劣汰和(ii)遗传继承。本文针对QSAR数据集的特征选择和预测建模提出了进化算法。

著录项

  • 作者

    Ozdemir, Muhsin.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Statistics.; Chemistry Pharmaceutical.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 271 p.
  • 总页数 271
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 统计学;药物化学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号